1. Hyperparameter tuning
We've covered the process of creating baseline models. Now, let's talk about the subsequent steps and hyperparameter optimization.
2. Iterations
Here is a table with the results of our three baseline models for the Taxi Fare Prediction competition. Once we have a baseline and observe some correlation with the Leaderboard, we start creating new features.
For example, we could add the hour of the ride to our model. It improves the results and we advance 40 positions on the Leaderboard.
Adding distance feature improves our rank by another 60 positions.
And this process is endless. We keep creating new features trying to improve our local validation and Public Leaderboard scores. However, it's impossible to check every new feature or small change on the Leaderboard, because the number of submissions to Kaggle is limited, usually to 5 attempts per day.
3. Iterations
That is why, generally we make submissions after a couple of changes just to track that our local validation score moves in the same direction with the Public Leaderboard score. For example, making submissions after the gradient boosting baseline and only after two new features are created.
4. Hyperparameter optimization
Feature engineering is the major resource of improving our score in classic Machine Learning competitions (with tabular or time series data). Once we are out of ideas for feature engineering, we move on to hyperparameter optimization. It means that we try to find a set of model parameters that further improves the validation score.
On the other hand, in Deep Learning competitions with text or images data, there is no need for feature engineering. Neural nets are generating features on their own, while we need to specify the architecture and a list of hyperparameters. So, generally speaking, in Deep Learning competitions we only have to optimize the hyperparameters.
5. Ridge regression
The simplest hyperparameter example could be found in the Ridge regression.
In a basic least squares linear regression we need to minimize the residual sum of squares, where y is true values, while y hat is model predictions.
6. Ridge regression
Ridge regression introduces a regularization term to the loss function. It is the sum of squares of model coefficients w. So, we are adding a penalty on the coefficients size.
The volume of this penalty is controlled by a hyperparameter alpha. It's not estimated by the model. Instead, we have to specify it manually outside the model, that's why it's called a "hyperparameter".
And hyperparameter tuning helps us to find the best alpha value automatically.
7. Hyperparameter optimization strategies
There is a number of different strategies to find the best set of hyperparameters. The most popular include:
Grid search. We're selecting a discrete grid of possible hyperparameter values and loop through each possible combination.
Another approach is a random search. In this case, we just specify the search range for the parameters, for example, 'from' and 'to' values. For each iteration, we just sample the observations from this distribution.
Another method is Bayesian optimization. We also need to specify the search space for the hyperparameters. The difference with random or grid search is that Bayesian optimization uses past evaluation results to choose the next hyperparameter values to evaluate.
8. Grid search
In this course we'll cover only the grid search. If you would like to learn about other techniques, do not hesitate to take DataCamp’s course on the hyperparameter tuning.
For instance, let's tune, the alpha value for the Ridge regression using the grid search. Firstly, we create a grid of possible alpha values.
Then, we create a results dictionary to store the scores. For each value in the grid,
we create a ridge regression model with a specific alpha value.
Calculate the validation score with this particular alpha.
And store the results in the dictionary. Once we've looped through all the grid values, we can select the alpha value that achieves the best validation score. It is our optimal hyperparameter value.
9. Let's practice!
All right, now it's your turn to tune hyperparameters!