1. Hidden layers and parameters
So far, we've used one input layer and one linear layer. Now, we'll add more layers to help the network learn complex patterns.
2. Stacking layers with nn.Sequential()
We'll stack three linear layers using nn.Sequential(), a PyTorch container for stacking layers in sequence. This network takes input, passes it to each linear layer in sequence, and returns output. In this case, the layers within nn.Sequential() are hidden layers.
3. Stacking layers with nn.Sequential()
Here, the first argument in the first layer represents the number of input features (n_features), and the last argument in the final layer represents the number of output classes (n_classes), both defined by the dataset.
4. Adding layers
We can add
5. Adding layers
as many hidden layers
6. Adding layers
as we like
7. Adding layers
as long as the input dimension of each layer matches the output dimension of the previous one.
8. Adding layers
In our three-layer example,
9. Adding layers
the first layer takes 10 features and outputs 18.
10. Adding layers
The second layer takes 18 and outputs 20.
11. Adding layers
Finally, the third layer takes 20 and outputs 5.
12. Layers are made of neurons
A layer is fully connected when each neuron links to all neurons in the previous layer, as shown in red in the figure.
13. Layers are made of neurons
Each neuron in a linear layer
14. Layers are made of neurons
performs a linear operation using all neurons from the previous layer.
15. Layers are made of neurons
Therefore, a single neuron has N plus one learnable parameters, with N being the output dimension of the previous layer, plus one for the bias.
16. Parameters and model capacity
Increasing the number of hidden layers increases the total number of parameters in the model, also known as model capacity. Higher-capacity models can handle more complex datasets but may take longer to train.
An effective way to assess a model's capacity is by calculating its total number of parameters.
Let's break it down with a two-layer network.
17. Parameters and model capacity
The first layer has four neurons.
18. Parameters and model capacity
Each neuron has eight weights and one bias, which results in 36 parameters.
19. Parameters and model capacity
The second layer has two neurons.
20. Parameters and model capacity
Each neuron has four weights and one bias, for a total of ten parameters.
Adding them together, this model has 46 learnable parameters in total.
21. Parameters and model capacity
We can also calculate this in PyTorch using the .numel method.
This method returns the number of elements in a tensor.
By looping through the model's parameters and summing the number of elements, we can confirm that the total is also 46.
22. Balancing complexity and efficiency
Understanding parameter count helps us balance model complexity and efficiency. Too many parameters can lead to long training times or overfitting, while too few might limit learning capacity.
23. Let's practice!
Let's practice!