Get startedGet started for free

Training a network in TensorFlow

1. Training a network in TensorFlow

In the final video in this chapter, we'll wrap-up by discussing important topics related to training neural networks in TensorFlow.

2. Initializing variables

We saw that finding the global minimum can be difficult, even when we're minimizing a simple loss function. We also saw that we could improve our chances by selecting better initial values for variables. But what can we do for more challenging problems with many variables? Take the eggholder function, for example, which has many local minima. It is difficult to see a global minimum on the plot, but it has one. How can we select initial values for x and y, the two inputs to the eggholder function? Even worse, what if we have a loss function that depends on hundreds of variables?

3. Random initializers

We often need to initialize hundreds or thousands of variables. Simply using ones will not work. And selecting initial values individually is tedious and infeasible in many cases. A natural alternative to this is to use random or algorithmic generation of initial values. We can, for instance, draw them from a probability distribution, such as the normal or uniform distributions. There are also specialized options, such as the Glorot initializers, which are designed for ML algorithms.

4. Initializing variables in TensorFlow

Let's start by using the low-level approach to initialize a 500x500 variable. We can do this using draws from a random normal distribution by passing the shape 500, 500 to tf dot random dot normal and passing the result to tf dot Variable. Alternatively, we could use the truncated random normal distribution, which discards very large and very small draws.

5. Initializing variables in TensorFlow

We can also use the high-level approach by initializing a dense layer using the default keras option, currently the glorot uniform initializer, as we've done in all exercises thus far. If we instead wish to initialize values to zero, we can do this using the kernel initializer parameter.

6. Neural networks and overfitting

Overfitting is another important issue you'll encounter when training neural networks. Let's say you have a linear relationship between two variables. You decide to represent this relationship with a linear model, shown in red, and a more complex model, shown in blue. The complex model perfectly predicts the values in the training set, but performs worse in the test set. The complex model performed poorly because it overfit. It simply memorized examples, rather than learning general patterns. Overfitting is especially problematic for neural networks, which contain many parameters and are quite good at memorization.

7. Applying dropout

A simple solution to the overfitting problem is to use dropout, an operation that will randomly drop the weights connected to certain nodes in a layer during the training process, as shown on the right. This will force your network to develop more robust rules for classification, since it cannot rely on any particular nodes being passed to an activation function. This will tend to improve out-of-sample performance.

8. Implementing dropout in a network

Let's look at how dropout works. We first define an input layer using the borrower features from our credit card dataset as an input. We then pass the input layer to a dense layer, which has 32 nodes and uses a relu activation function.

9. Implementing dropout in a network

We'll next pass the first dense layer to a second layer, which reduces the number of output nodes to 16. Before passing those nodes to the output layer, we'll apply a dropout layer. The only argument specifies that we want to drop the weights connected to 25% of nodes randomly. We'll then pass this to the output layer, which reduces the 16 nodes to 1 and applies a sigmoid activation function.

10. Let's practice!

You now know everything you need to construct and train a neural network, so let's do that with some exercises.