1. Activation functions
But creating this multiply-add-process is only half the story for hidden layers. For neural networks to achieve their maximum predictive power, we must apply something called an activation function in the hidden layers.
2. Linear vs. non-linear Functions
An activation function allows the model to capture non-linearities. Non-linearities, as shown on the right here, capture patterns like how going from no children to one child may impact your banking transactions differently than going from three children to four. We have examples of linear functions, straight lines on the left, and non-linear functions on the right. If the relationships in the data aren’t straight-line relationships, we will need an activation function that captures non-linearities.
3. Activation functions
An activation function is something applied to the value coming into a node, which then transforms it into the value stored in that node, or the node output.
4. Improving our neural network
Let's go back to the previous diagram. The top hidden node previously had a value of 5. For a long time, an s-shaped function called tanh was a popular activation function.
5. Activation functions
If we used the tanh activation function, this node's value would be tanh(5), which is very close to 1.Today, the standard in both industry and research applications is something called
6. ReLU (Rectified Linear Activation)
the ReLU or rectified linear activation function. That's depicted here. Though it has two linear pieces, it's surprisingly powerful when composed together through multiple successive hidden layers, which you will see soon. The code that incorporates activation functions
7. Activation functions
is shown here. It is the same as the code you saw previously, but we've distinguished the input from the output in each node, which is shown in these lines and then again here And we've applied the tanh function to convert the input to the output. That gives us a prediction of 1-point-2 transactions.
8. Let's practice!
In the exercise, you will use the Rectified Linear Activation function, or ReLU, in your network.