Backpropagation in practice

1. Backpropagation in practice

Let’s see this back propagation in a deeper network.

2. Backpropagation

Start at the last set of weights. Those are currently 1 and 2.

3. Backpropagation

We multiply 3 things. The node values feeding into these weights

4. Backpropagation

are 1 and 3. The relevant slope for the output node is 2 times the error. That’s 6. And the slope of the activation function is 1, since the output node is positive. So,

5. Backpropagation

we have a slope for the top weight of 6, and a slope for the bottom weight of 18.Those slopes we just calculated feed into the formula associated with weights further back in the network. Let's do that calculation one layer back now. We’ve hidden the earlier and later layers, since we don’t need them to calculate the slopes for this layer of the network. This graph uses white to denotes node values, black to denote weight values, and the red shows the calculated slopes of the loss function with respect to that node, which we just finished calculating. This is all the information we need to calculate the slopes of the loss function with respect to the weights in this diagram.

6. Calculating slopes associated with any weight

Recall, the three things we multiply to get slopes associated with any weight: value at the node feeding into the weight,the slope of the activation function for the node being fed into. That slope is 1 in all cases here. The slope of the loss function with respect to the output node. Let's start with

7. Backpropagation

the slopes related to the weights going into the top node. For the top weight going into the top node, we multiply

8. Backpropagation

0 for the input node's value, which is in white. Times

9. Backpropagation

6 for the output node's slope, which is in red. Times the derivative of the ReLU activation function. That output node has a positive value for the input, so the ReLU activation has

10. Backpropagation

a slope of 1. 0 times 6 times 1 is 0.For the other weight going into this node, we have

11. Backpropagation

1 times 6 times the slope of the ReLU activation function at the output node's value. The slope of the activation function is still 1. So, we have 1 times 6 times 1, which is 6.Here we also show slopes associated with

12. Backpropagation

the other two weights. We would multiply them all by a learning rate, and use the results to update the weights in gradient descent. Pause the video and make sure you understand how these last two weights were calculated. You are through the hardest concepts in this course which are gradient descent and back-propagation. As a recap,

13. Backpropagation: Recap

we start at some random set of weights. We then go through the following iterative process Use forward propagation to make a prediction. Use backward propagation to calculate the slope of the loss function with respect to each weight. Multiply that slope by the learning rate, and subtract that from the current weights. Keep going with that cycle until we get to a flat part. For computational efficiency,

14. Stochastic gradient descent

it is common to calculate slopes on only a subset of the data, called a batch, for each update of the weights. You then use a different batch of data to calculate the next update. Once we have used all our data, we start over again at the beginning of the data. Each time through the full training data is called an epoch. So if we're going through our data for the 3rd time, we'd say we are on the 3rd epoch. When slopes are calculated on one batch at a time, rather than on the full data, that is called stochastic gradient descent, rather than gradient descent, which uses all of the data for each slope calculation. The process will be partially automated for you, but understanding the process will help fix any surprises that come up when building your models.

15. Let's practice!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.