Get startedGet started for free

RandomizedSearchCV

1. RandomizedSearchCV

Now that we have discussed the basics of hyperparameter tuning let's combine model validation with tuning to start creating the most accurate, validated models possible.

2. Grid searching hyperparameters

Consider if we only had two parameters to choose from, the number of trees and the maximum depth. If we had five options for the number of trees and four options for the maximum depth, this would create 20 possible combinations of these parameters. Notice that they form a grid of possible parameters. We could conduct a complete grid search, and run our random forest model using each unique combination of the two hyperparameters.

3. Grid searching continued

There is one main benefit for grid searching, which is that each possible combination of values will be tested. However, there is one major drawback: any additional parameter added for testing will grow the training time exponentially. Therefore grid searching is only possible with a limited number of parameters, and a limited number of ranges.

4. Better methods

There are two amazing alternatives to using grid search, which both have their advantages over grid searching. Random searching, which consists of randomly selecting from all hyperparameter values from the list of possible ranges, and Bayesian optimization, which uses the past results of each test to update the hyperparameters for the next run. Bayesian approaches are out of the scope for this course, so we will focus on random searching methods.

5. Random search

To implement a random search, we can use scikit-learn's method, RandomizedSearchCV(). This method will randomly select hyperparameters for each model run based on the user-defined hyperparameter space. RandomizedSearchCV() requires a dictionary of hyperparameters and their possible values. Here we have specified four max depths, nine max features, and nine min_samples_split options. Using a grid search with these possible parameters would take 324 total model runs,as 4 times 9 times 9 is 324. However, using a random search, we can get similar results only using 30 or 40 runs.

6. Random search parameters

To use this method we need to specify a few other parameters. The parameter n_iter specifies the number of models to run. estimator allows us to set the base model, such as a random forest regression model, and scoring allows us to specify a scoring function.

7. Setting RandomizedSearchCV parameters

Aside from setting up the parameter distributions, we need to create a model and the scoring function to use. The estimator specified here uses a RandomForestRegression() model, with 20 trees. We have also specified the mean absolute error to be the scoring function.

8. RandomizedSearchCV implemented

Let's finally implement the RandomizedSearchCV method. We use our model, rfr, the parameter distribution, specify to use 40 parameter sets and set the cv value to 5. Ah! So hopefully, the "CV" on the end of this method helps us see why we are even discussing hyperparameter tuning in a course about model validation! The cross-validation techniques we have been discussing will be used with random searching to help us select the best model for our data. After all, if we test 40 different parameter sets, how do we determine which one is the best? And how do we appropriately compare their results? We have to use the techniques we have learned so far in this course.

9. RandomizedSearchCV implemented

After using RandomizedSearchCV, we should have a validated model that has better accuracy than using the base implementation of that model. To actually complete the random search, we use the dot-fit() method, just like any other scikit-learn model.

10. Let's explore some examples!

We will save exploring the output of the RandomizedSearchCV() method for the next lesson. For now, let's run through some examples.