1. Improving model performance
In our final video, we'll bring everything together and learn a recipe for tackling any deep learning problem.
2. Steps to maximize performance
First, we create a model that can overfit the training set. This will ensure that the problem is solvable.
We also set a performance baseline to aim for with the validation set.
We then need to reduce overfitting to increase performance on the validation set.
Finally, we can slightly adjust the different hyperparameters to ensure we achieve the best possible performance.
3. Step 1: overfit the training set
It's useful to start with a single data point before overfitting the entire training set. This ensures the problem is solvable and helps catch potential bugs.
We modify the training loop to repeatedly train on a single example rather than iterating through the entire dataloader.
When the model is set up properly, it should quickly reach near-zero loss and 100% accuracy on that data point.
Once this step is successful, we scale up to the entire training set. At this stage, we use an existing model architecture large enough to overfit while keeping hyperparameters like the learning rate at their defaults.
4. Step 2: reduce overfitting
Now we need to create a model that generalizes well to maximize the validation accuracy.
A few strategies to reduce overfitting, such including dropout, data augmentation, weight decay, or reducing the model capacity.
We need to keep track of the different parameters and the corresponding validation accuracy for each set of experiments.
5. Step 2: reduce overfitting
Reducing overfitting often comes at a cost, as applying regularization can significantly impact model performance.
The original model overfits the training set, achieving high accuracy but failing to generalize well to new data. In contrast, with too much regularization, the updated model shows a drop in training and validation accuracy, limiting its ability to learn effectively.
This highlights the importance of balancing overfitting reduction strategies while closely monitoring key metrics to find the best-performing model.
6. Step 3: fine-tune hyperparameters
Once we're satisfied with performance, the final step is fine-tuning hyperparameters. This is often done on optimizer settings like learning rate or momentum.
Grid search tests parameters at fixed intervals. For example, momentum values from 0.85 to 0.99 and learning rates from
ten to the minus two and ten to the minus six.
Random search takes a different approach. Instead of testing set values, it randomly selects them within a given range. The np.random.uniform(2, 6) function, for example, picks a number between 2 and 6, allowing us to explore a wider variety of learning rates.
Random search is often more efficient, as it avoids unnecessary tests and increases the chance of finding optimal settings.
7. Let's practice!
Building a production-ready deep learning model requires continuous iteration. Let's practice!