Get startedGet started for free

Random Search in Scikit Learn

1. Random Search in Scikit Learn

In this lesson we will be introduced to Scikit Learn's RandomizedSearchCV module. Just like with GridSearchCV, it is a more efficient way of undertaking random search than doing it manually and allows us to easily capture extra information on our training.

2. Comparing to GridSearchCV

Since we have already covered GridSearchCV, we don't need to learn a lot of new steps. Let's recall the steps for a Grid Search: One Decide an algorithm to tune the hyperparameters for (Sometimes called an "estimator"). Two Define which hyperparameters we will tune. Three Define a range of values for each hyperparameter. Four Set a cross-validation scheme. Five Define a scoring function so we can decide which grid square (model) was the best. Six Decide to include extra useful information or functions.

3. Comparing to Grid Search

There is only one difference when undertaking a random search. We need to decide how many hyperparameter combinations we will randomly sample to build models and then undertake this sampling before we model. And that's pretty much it!

4. Comparing Scikit Learn Modules

It is therefore not a surprise to see how similar the functions are between the two Scikit Learn modules. See them here side by side. It may not be obvious what is different since there is far more the same than not!

5. Key differences

There is really only two key differences: n_iter, which is the number of samples for the random search to take from your grid. In the previous example you did 300. param_distributions, is slightly different from param_grid. You can optionally give information on how to sample such as using a particular distribution you provide. If you just give a list as we have been doing, the default is to sample 'uniformly' meaning every item in the list (combination) has equal chance of being chosen.

6. Build a RandomizedSearchCV Object

Now let's create a RandomSearchCV object, including the key changes we need to make. Creating a list of values and setting up the grid looks all very similar. We firstly create the lists of hyperparameter values using the np-dot-linspace() and range() functions, then set up the dictionary grid of hyperparameter values. The crucial small difference is at the end, defining how many samples to take.

7. Build a RandomizedSearchCV Object

Now we create the random search object Note the slightly different name for the parameter grid, which is now called 'param_distributions' as well as our new n_iter input for how many combinations to select and train models with.

8. Analyze the output

The attributes that form the output of RandomizedSearchCV are exactly the same as the GridSearchCV module. However it would be interesting to see what numbers it sampled. We can visualize using the code from the previous lesson but where do we get details on the hyperparameters used? Do you remember? That's right - it is found in the cv_results_ dictionary that was returned and in the relevant param_ columns. Let's extract the learning_rate and min_samples_leaf used to plot them.

9. Analyze the output

Now we can plot our results. We set the x and y limits using NumPy's min and max functions over our list of hyperparameter values so that we can best see the coverage. Then we plot these combinations as a scatter plot. For computational efficiency we only ran 10 models this time. Else it would take a while!

10. Analyze the output

You will notice this is plot looks very similar to what plotted before as hyperparameter combinations without actually undertaking the model creation. Now we actually ran one, it looks very similar. Random search has a wide coverage of area for possible hyperparameters but it is a very patchy coverage.

11. Let's practice!

Now it's your turn to try your hand at Scikit Learn's RandomizedSearchCV module!