Discovering activation functions
1. Discovering activation functions
Welcome back. So far, we have seen neural networks made only of linear layers.2. Activation functions
We can add non-linearity to our models using activation functions. We'll discuss two activation functions: sigmoid for binary classification and softmax for multi-class classification. This non-linearity allows networks to learn more complex interactions between inputs and targets than only linear relationships. We'll call the output of the last linear layer the "pre-activation" output, which we'll pass to activation functions to obtain the transformed output.3. Meet the sigmoid function
The sigmoid activation function is widely used for binary classification problems. Let's say we are trying to classify an animal as a mammal or not.4. Meet the sigmoid function
We have three pieces of information: the number of limbs, whether it lays eggs, and whether it has hair. The latter two are binary variables: 1 if yes, 0 if no.5. Meet the sigmoid function
Passing the input to a model with two linear layers returns a single output: the number six. This number is not yet interpretable.6. Meet the sigmoid function
We pass the number six through the sigmoid function,7. Meet the sigmoid function
transforming it into a range representing probability between zero and one. We are now ready to perform a binary classification! If the output is closer to one (greater than 0.5), we label it as class one (mammal). If it were less than 0.5, the prediction would be zero (not a mammal).8. Meet the sigmoid function
Let's implement sigmoid in PyTorch! Here, nn.Sigmoid() takes a one-dimensional input_tensor of value six and returns an output of the same size, meaning it is also one-dimensional. The output is now bounded between zero and one.9. Activation as the last layer
Typically, nn.Sigmoid() is added as the last step in nn.Sequential(), automatically transforming the output of the final linear layer. Interestingly, a neural network with only linear layers and a sigmoid activation behaves like logistic regression, but adding more layers and activations unlocks the true power of deep learning, which we'll see later.10. Getting acquainted with softmax
We use softmax, another popular activation function, for multi-class classification involving more than two class labels. Let's say we have three classes:11. Getting acquainted with softmax
bird (0),12. Getting acquainted with softmax
mammal (1),13. Getting acquainted with softmax
and reptile (2).14. Getting acquainted with softmax
In this network, softmax takes a three-dimensional pre-activation output and generates an output of the same shape, one by three.15. Getting acquainted with softmax
The output is a probability distribution because each element is between zero and one, and values sum to one.16. Getting acquainted with softmax
Here, the prediction is for the second class, mammals, which has the highest probability, 0.842.17. Getting acquainted with softmax
In PyTorch, we use nn.Softmax(). dim=-1 indicates that softmax is applied to input_tensor's last dimension. Similar to sigmoid, softmax can be the last layer in nn.Sequential.18. Let's practice!
We've added two powerful activation functions, Sigmoid and Softmax, to our arsenal. Now, it's time to put them into practice!Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.