Get Started

Evaluating model performance

1. Evaluating model performance

We have already discovered how to train neural networks in PyTorch and how to adjust different hyperparameters. However, we are still missing metrics to assess different iterations of our model. We'll now focus on model evaluation in PyTorch.

2. Training, validation and testing

We're already aware that when starting a machine learning project, we need to split our dataset into three subsets: training, validation, and testing. During training, we'll adjust model parameters (our weights and biases). During validation, we'll tune hyperparameters, such as learning rate and momentum. Just as in traditional machine learning, the test dataset is used only once to calculate final metrics.

3. Model evaluation metrics

In this video, we'll focus on evaluating the following metrics in PyTorch: loss during training and validation, and the accuracy during training and validation. Recall that in classification tasks, accuracy is a measure of how well a model correctly predicts ground truth labels.

4. Calculating training loss

Let's begin with loss. First, we'll calculate training loss, which is calculated by summing up loss at each iteration of the dataloader. At the end of each epoch, we calculate the mean value of the training loss. In PyTorch, we begin by initializing training loss to zero. We then iterate through the training dataloader, run a forward pass (not shown) and compute the loss. In addition to calculating gradients and updating weights as usual, (also not shown), we add the current loss to the previous training losses. The loss tensor's .item() method returns the Python number contained in the tensor. Recall that one complete loop through the data loader is one epoch. We calculate mean loss by dividing total training_loss by the dataloader's length: the number of batches in our dataset.

5. Calculating validation loss

We take a similar approach to calculate validation loss. After each training epoch, we then iterate over the dataloader containing the validation dataset. The validation epoch loop is slightly different. We first need to use the .eval method of the model to put the model in evaluation mode, because some layers in PyTorch models behave differently at training versus validation stages. We also add a Python context with torch.no_grad, indicating we will not be performing gradient calculation in this epoch. Validation loss is then calculated similarly to training loss. We set the model back to training mode at the end of the validation epoch, so we can run another training epoch.

6. Overfitting

Keeping track of validation and training losses during training helps us detect overfitting. Overfitting occurs when the model stops generalizing and performance on the validation dataset degrades. This figure shows overfitting happens when validation loss is high but training loss is not.

7. Calculating accuracy with torchmetrics

In addition to loss, we also want to keep track of other metrics to evaluate how well our model is at predicting correct answers. To do so, we'll use a new package called torchmetrics. If we are performing classification, we can use torchmetrics to create an accuracy metric. On each iteration of dataloader, we call this metric using model outputs and ground truth labels. The accuracy metric takes probabilities and single number labels as inputs. The output variable here would be the probabilities returned by the softmax function. If the labels contain one-hot encoded classes, we'll need the argmax function to obtain numbers instead of one-hot vectors. At the end of the epoch, we calculate total accuracy using the metric's .compute() method. Finally, we use .reset() to reset the metric for the next epoch. Accuracy is calculated in the same way for training and validation.

8. Let's practice!

Let's practice writing the evaluation loop and using the torchmetrics package.