1. Grid vs. Random Search
Defining a hyperparameter grid is easy, right? So let's learn even more about it.
2. Grid search continued
What we did in the previous lesson was **fixing** the learning rate and the minimum number of training set samples in a node to commence splitting. For the number of trees and for tree complexity, we compared different options.
Then we trained a model with repeated cross-validation
and gradient boosting.
3. Grid search with hyperparameter ranges
But what if we don't want to define a set of distinct values for each hyperparameter but instead want to define a range of values?
This is easy as well, just use the seq() function and define at which value you want to start, at which value to stop and by what increments you want to go between start and stop.
Here you see that you can now end up with non-integer values in your grid and that the grid will quickly grow much bigger.
4. Grid search with many hyperparameter options
Let's see what happens if we use this grid in a gradient boosting model the same way we used the Cartesian grid before - as input to the tunegrid argument in train.
240 seconds to train! I'm sure you can see how quickly you would end up with a training that takes forever to run if you keep increasing the hyperparameters you want to tune!
5. Cartesian grid vs random search
So far, we have always compared all possible combinations of hyperparameters in our predefined grid. This method was called Cartesian grid search.
Here, you see a different way to plot the hyperparameter tuning results: instead of using the base R plot function, you can also feed the model object to the ggplot function and see a similar line plot of hyperparameter combinations and their corresponding accuracies.
Even though we want to compare as many hyperparameters as possible in order to find the most optimal combination for our model, using Cartesian grid search will become slow and computationally expensive very quickly.
So let's look at a faster alternative - random search.
With random search we no longer test all possible combinations of different hyperparameters; instead we will randomly pick a specified number of hyperparameter combinations by chance and only evaluate those regarding model performance.
6. Random search in caret
To use random search, another option is available in caret's trainControl function called search. Possible inputs to this argument are "grid" and "random".
The built-in models contained in caret contain code to generate random tuning parameter combinations.
The total number of unique combinations is specified by the tuneLength option in train. We already got to know tuneLength in the first chapter, where we used it to define the number of tuning parameters to compare in caret's automatic tuning function. There, all possible combinations of these hyperparameters were compared. Here we use it to define how many randomly picked hyperparameter combinations to evaluate. 5 is of course too few and in reality, you would want to test at least 100. But again, for demonstration purposes we go with the time-saving version, here,
which still takes almost 1 minute to train.
7. Random search in caret
This is how the model output looks like now:
5 randomly picked hyperparameter combinations and their corresponding accuracy and kappa values are shown.
And - as always - the final values are given in the text at the bottom of the printed output.
One important thing to note is this: in caret, random search can NOT be combined with grid search! This means that the tunelength argument cannot be used to sample from a customized grid.
8. Let's get coding!
Alright, now it's your turn to try out what you've learned!