Optimizers
1. Optimizers
In chapter 2, you minimized a loss function with an optimizer. We'll revisit that here in the context of training neural networks. This entails finding the set of weights that corresponds to the minimum value of the loss.2. How to find a minimum
So what is a minimization problem? And what can go wrong when we try to solve one? Let's start with a simple thought experiment: you want to find the lowest point in the Grand Canyon, but all you can do is pick a point, measure the elevation, and then repeat the same to nearby points. This is what you do when you train a neural network: you pick a starting point, measure the loss, and then try to move to a lower loss. We will see how a common optimization algorithm, gradient descent, solves this problem.3. How to find a minimum
Let's start by picking a point and measuring the elevation. From that point, we'll move along the slope until we arrive on a flat surface. To understand what's going on, imagine you dropped a ball into the canyon from the point you selected. If you drop the ball on a slope above a plateau, the ball will stop when it reaches the plateau. If this happens, the gradient descent algorithm will fail. It will stop on a local minimum and will progress no further.4. How to find a minimum
Let's say you pick a different spot. This time, the ball lands on a slope with an unobstructed path to the lowest point in the canyon. Here, the gradient descent algorithm works and ball reaches the global minimum. Notice that gravity performs the role of the gradient descent optimizer.5. Stochastic gradient descent
Stochastic gradient descent or SGD is an improved version of gradient descent that is less likely to get stuck in local minima. For simple problems, the SGD algorithm performs well. Here, the SGD loss function value quickly falls below the losses for the more recently developed RMS Prop and the Adam optimizers on a simple minimization task. Adam and RMS require 10 times as many iterations to achieve a similar loss.6. The gradient descent optimizer
Let's move on to the TensorFlow implementation for these optimizers, starting with SGD, which you can instantiate using the keras optimizers module. You can then supply a learning rate, typically between zero point five and zero point zero zero one, which will determine how quickly the model parameters adjust during training. Think of a higher learning rate as exerting more force on the ball than gravity alone. The ball will move faster and skip over some plateaus, but it may miss the global minimum, too. The main advantage of SGD is that it is simpler and easier to interpret than more modern optimization algorithms.7. The RMS prop optimizer
Next, we'll consider the RMS propagation optimizer, which has two advantages over SGD. First, it applies different learning rates to each feature, which can be useful for high dimensional problems. And second, it allows you to both build momentum and also allow it to decay. Setting a low value for the decay parameter will prevent momentum from accumulating over long periods during the training process.8. The adam optimizer
Finally, the adaptive moment or "adam" optimizer provides further improvements and is generally a good first choice. Similar to RMS prop, you can set the momentum to decay faster by lowering the beta1 parameter. Relative to RMS prop, the adam optimizer will tend to perform better with the default parameter values, which we will typically use.9. A complete example
Let's return to our credit card default prediction problem and assume that features have been imported and weights have been initialized. We'll then define a model that computes the predictions and a loss function that computes the binary_crossentropy loss, which is the standard for binary classification problems. Finally, we define an RMS prop optimizer with a learning rate of zero point one and a momentum parameter of zero point nine, and then perform minimization.10. Let's practice!
We now know how to find a minimum, so that's do that in some exercises.Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.