1. Writing our first training loop
We now have the core components to train a PyTorch deep learning model!
2. Training a neural network
Once we create a model, choose a loss function, define a dataset, and set an optimizer, we're ready to train. This involves looping through the dataset, calculating the loss, computing gradients, and updating model parameters.
This process, called the training loop, repeats multiple times.
3. Introducing the Data Science Salary dataset
A training loop allows for greater flexibility and control, giving us the option to customize different elements. We'll work with a dataset of data scientist salaries to see this in action.
Features are categorical, and the target is salary in US dollars, already normalized.
Since the target is a continuous value, this is a regression problem. For regression, we'll use a linear layer as the final output instead of softmax or sigmoid.
Additionally, we'll apply a regression-specific loss function as cross-entropy is only used for classification tasks.
4. Mean Squared Error Loss
We can use mean squared error (MSE) loss for regression problems.
The MSE loss is the mean of the squared difference between predictions and ground truth, as shown in this Python implementation.
In PyTorch, we use the nn.MSELoss function as a criterion. Note that both predictions and targets must be float tensors.
5. Before the training loop
Let's put everything together now.
We have two NumPy arrays, "features" and "target", containing our data and labels. We start by passing these to TensorDataset() to organize our features and targets into the right data types, floats.
This casting is required, and float is the data type used by the parameters of our model. We can now load our dataset into the DataLoader() class to enable batching.
Here we use a small batch size of four, but selection of batch size is customizable depending on the use case.
We create our model next. This dataset has four input features and one target (output). We won't need one-hot encoding as this is a regression problem.
Finally, we create the MSE loss criterion and the optimizer. A 0.001 learning rate is a good default learning rate for most deep learning problems.
6. The training loop
We can now loop through the dataset multiple times.
Looping through the entire dataset once is called an epoch and we train over multiple epochs, indicated by num_epochs.
For each epoch, we loop through the dataloader. Each iteration of the dataloader provides a batch of samples, which we saw earlier.
Before the forward pass, we set the gradients to zero using optimizer.zero_grad(), because the optimizer stores gradients from previous steps by default.
We get features and targets from each sample of the dataloader.
We use the features for the forward pass of the model, and we use the target for the loss calculation. Finally, we use the optimizer to update the parameters of the model.
7. Let's practice!
Let's get some training in.