1. Backpropagation
You’ve used gradient descent to optimize weights in a simple model. Now we'll add a technique called “back propagation” to calculate the slopes you need to optimize more complex deep learning models.
2. Backpropagation
Just as forward propagation sends input data through the hidden layers and into the output layer, back propagation takes the error from the output layer and
3. Backpropagation
propagates it backward through the
4. Backpropagation
hidden layers, towards the input layer.
5. Backpropagation
It calculates the necessary slopes sequentially
from the weights closest to the prediction, through the hidden layers, eventually back to the weights coming from the inputs. We then use these slopes to update our weights as you've seen. Back propagation is tricky. So you should focus on the general structure of the algorithm, rather than trying to memorize every mathematical detail.
6. Backpropagation process
In the big picture, we are trying to estimate the slope of the loss function with respect to each weight in our network. You've already seen that we use prediction errors to calculate some of those slopes. So we always do forward propagation to make a prediction and calculate an error before we do back propagation.
7. Backpropagation process
Here are the results of forward propagation. Node values are in white and weights are in black. We need to be at this step before we can start back-propagation. Notice, we are using the "relu" activation function. So any node
8. Backpropagation process
whose input is negative takes a value of 0, and that happens in the top node of the first hidden layer.
9. Backpropagation process
For back-propagation, we go back one layer at a time, and each time we go back a layer, we'll use a formula for slopes that you saw in the last video. Every weight feeds from some input node into some output node. The three things we multiply to get the slope for that weight are1, the value at the weights input node.2, the slope from plotting the loss function against that weight's output node.3, the slope of the activation function at the weight's output. We know the value at the node feeding into this weight. Either it is in an input layer, in which case we have it from the data. Or that node is in a hidden layer, in which case we calculated its value when we did forward propagation. The second item on this list is the slope of the loss function with respect to the output node. We do backward propagation from the right side of our diagram to the left. So we already calculated that slope by the time we to plug it into the current calculation. Finally we need the slope of the activation function at the node it feeds into.
10. ReLU Activation Function
You can see from this diagram that, for the ReLU function, the slope is 0 if the input into a node is negative. If the input into the node is positive, the output is the same as the input. So the slope would be 1.So far,
11. Backpropagation process
we have focused on calculating slopes of the loss function with respect to weights. We also keep track of the slopes of the loss function with respect to node values, because we use those slopes in our calculations of slopes at weights. The slope of the loss function with respect to any node value is the sum of the slopes for every weight coming into that node. -
12. Let's practice!