1. Activation functions
So far we've been using several activation functions in our models, but we haven't yet covered their role in neural networks other than when it comes to obtaining the output we want in our output layer.
2. An activation function
Inside the neurons of any neural network the same process takes place:
3. An activation function
A summation of the inputs reaching the neuron multiplied by the weights of each connection and the addition of the bias weight.
This operation results in a number: a, which can be anything, it is not bounded.
4. An activation function
We pass this number into an activation function that essentially takes it as an input and decides how the neuron fires and which output it produces.
Activation functions impact learning time, making our model converge faster or slower and achieving lower or higher accuracy. They also allow us to learn more complex functions.
5. Activation zoo
Four very well known activation functions are:
The sigmoid, which varies between 0 and 1 for all possible X input values.
The tanh or Hyperbolic tangent, which is similar to the sigmoid in shape but varies between -1 and 1.
6. Activation zoo
The ReLU (Rectified linear unit) which varies between 0 and infinity and
the leaky ReLU, which we can look as a smoothed version of ReLU that doesn't sit at 0, allowing negative values for negative inputs.
7. Effects of activation functions
Changing the activation function used in the hidden layer of the model we built for binary classification results in different classification boundaries.
8. Effects of activation functions
We can see that the previous model can not completely separate red crosses from blue circles if we use a sigmoid activation function in the hidden layer.
Some blue circles are misclassified as red crosses along the diagonal.
However, when we use the tanh we completely separate red crosses from blue circles, the separation region for the blue and red classification is smooth.
9. Effects of activation functions
Using a ReLU activation function we obtain sharper boundaries,the leaky ReLU shows similar behavior for this dataset.
It's important to note that these boundaries will be different for every run of the same model because of the random initialization of weights and other random variables that aren't fixed.
10. Which activation function to use?
All activation functions come with their pros and cons. There's no easy way to determine which activation function is best to use.
Based on their properties, the problem at hand,and the layer we are looking at in our network, one activation function will perform better in terms of achieving our goal.
A way to go is to start with ReLU as they train fast and will tend to generalize well to most problems,avoid sigmoids,and tune with experimentation.
11. Comparing activation functions
It's easy to compare how models with different activation functions perform if they are small enough and train fast.
It's important to set a random seed with numpy, that way the model weights are initialized the same for each activation function.
We then define a function that returns a fresh new model each time, using the act_function parameter.
12. Comparing activation functions
We can then use this function as we loop over several activation functions, training different models and saving their history callback. We store all these callbacks in a dictionary.
13. Comparing activation functions
With this dictionary of histories, we can extract the metrics we want to plot, build a pandas dataframe and plot it.
14. Let's practice!
Let's explore the effects of activation functions!