Discovering activation functions

1. Discovering activation functions

Welcome back. So far, we have seen neural networks made only of linear layers.

2. Activation functions

We can add non-linearity to our models using activation functions. We'll discuss two activation functions: sigmoid for binary classification and softmax for multi-class classification. This non-linearity allows networks to learn more complex interactions between inputs and targets than only linear relationships. We'll call the output of the last linear layer the "pre-activation" output, which we'll pass to activation functions to obtain the transformed output.

3. Meet the sigmoid function

The sigmoid activation function is widely used for binary classification problems. Let's say we are trying to classify an animal as a mammal or not.

4. Meet the sigmoid function

We have three pieces of information: the number of limbs, whether it lays eggs, and whether it has hair. The latter two are binary variables: 1 if yes, 0 if no.

5. Meet the sigmoid function

Passing the input to a model with two linear layers returns a single output: the number six. This number is not yet interpretable.

6. Meet the sigmoid function

We pass the number six through the sigmoid function,

7. Meet the sigmoid function

transforming it into a range representing probability between zero and one. We are now ready to perform a binary classification! If the output is closer to one (greater than 0.5), we label it as class one (mammal). If it were less than 0.5, the prediction would be zero (not a mammal).

8. Meet the sigmoid function

Let's implement sigmoid in PyTorch! Here, nn.Sigmoid() takes a one-dimensional input_tensor of value six and returns an output of the same size, meaning it is also one-dimensional. The output is now bounded between zero and one.

9. Activation as the last layer

Typically, nn.Sigmoid() is added as the last step in nn.Sequential(), automatically transforming the output of the final linear layer. Interestingly, a neural network with only linear layers and a sigmoid activation behaves like logistic regression, but adding more layers and activations unlocks the true power of deep learning, which we'll see later.

10. Getting acquainted with softmax

We use softmax, another popular activation function, for multi-class classification involving more than two class labels. Let's say we have three classes:

11. Getting acquainted with softmax

bird (0),

12. Getting acquainted with softmax

mammal (1),

13. Getting acquainted with softmax

and reptile (2).

14. Getting acquainted with softmax

In this network, softmax takes a three-dimensional pre-activation output and generates an output of the same shape, one by three.

15. Getting acquainted with softmax

The output is a probability distribution because each element is between zero and one, and values sum to one.

16. Getting acquainted with softmax

Here, the prediction is for the second class, mammals, which has the highest probability, 0.842.

17. Getting acquainted with softmax

In PyTorch, we use nn.Softmax(). dim=-1 indicates that softmax is applied to input_tensor's last dimension. Similar to sigmoid, softmax can be the last layer in nn.Sequential.

18. Let's practice!

We've added two powerful activation functions, Sigmoid and Softmax, to our arsenal. Now, it's time to put them into practice!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.