1. Hyperparameter tuning
Now that we know how to evaluate model performance, let's explore how to optimize our model.
2. Hyperparameter tuning
Recall that we had to choose a value for alpha in ridge and lasso regression before fitting it.
Likewise, before fitting and predicting KNN, we choose n_neighbors.
Parameters that we specify before fitting a model, like alpha and n_neighbors, are called hyperparameters.
So, a fundamental step for building a successful model:
3. Choosing the correct hyperparameters
is choosing the correct hyperparameters.
We can try lots of different values, fit all of them separately, see how well they perform, and choose the best values!
This is called hyperparameter tuning.
When fitting different hyperparameter values, we use cross-validation to avoid overfitting the hyperparameters to the test set.
We can still split the data, but perform cross-validation on the training set.
We withhold the test set and use it for evaluating the tuned model.
4. Grid search cross-validation
One approach for hyperparameter tuning is called grid search, where we choose a grid of possible hyperparameter values to try.
For example, we can search across two hyperparameters for a KNN model - the type of metric and a different number of neighbors.
Here we have n neighbors between two and eleven in increments of three, and two metrics: euclidean and manhattan. Therefore, we can create a grid of values like this.
5. Grid search cross-validation
We perform k-fold cross-validation for each combination of hyperparameters. The mean scores for each combination are shown here.
6. Grid search cross-validation
We then choose hyperparameters that performed best, as shown here.
7. GridSearchCV in scikit-learn
Let's perform a grid search on a regression model using our sales dataset.
We import GridSearchCV from sklearn-dot-model_selection.
We instantiate KFold.
We then specify the names and values of the hyperparameters we wish to tune as the keys and values of a dictionary, param_grid.
As always, we instantiate our model.
We then call GridSearchCV and pass it our model, the grid we wish to tune over and set cv equal to kf.
This returns a GridSearch object that we can then fit to the training data, and this fit performs the actual cross-validated grid search.
We can then print the model's attributes best-params-underscore and best-score-underscore, respectively, to retrieve the hyperparameters that perform the best along with the mean cross-validation score over that fold.
8. Limitations and an alternative approach
Grid search is great. However, the number of fits is equal to the number of hyperparameters multiplied by the number of values multiplied by the number of folds. Therefore, it doesn't scale well!
So, performing 3-fold cross-validation for one hyperparameter with 10 values each means 30 fits, while 10-fold cross-validation on 3 hyperparameters with 10 values each equals 900 fits!
However, there is another way.
9. RandomizedSearchCV
We can perform a random search, which picks random hyperparameter values rather than exhaustively searching through all options. Let's demonstrate this approach.
We import RandomizedSearchCV from sklearn-dot-model_selection.
We set up KFold and param_grid, and instantiate the model as before.
We call RandomizedSearchCV using the same arguments and variables as GridSearchCV. We can optionally set the n_iter argument, which determines the number of hyperparameter values tested. So five-fold cross-validation with n_iter set to two performs 10 fits.
Again we can access the best hyperparameters and their score. In this case it is able to find the best hyperparameters from our previous grid search!
10. Evaluating on the test set
We can evaluate model performance on the test set by passing it to a call of the random search object's dot-score method.
It actually performs slightly better than the best score in our grid search!
11. Let's practice!
Now let's perform some hyperparameter tuning!