Get startedGet started for free

Introducing Random Search

1. Introducing Random Search

In this lesson we will cover the concept of random search, how it differs from grid search, as well as some advantages and disadvantages of this method.

2. What you already know

A random search is very similar to a grid search several key steps. We still define an estimator, which hyperparameters to tune and the range of values for each hyperparameter. Similarly, we still set a cross-validation scheme and scoring function. However when it comes to undertaking the search, rather than trying every single combination, you randomly sample N combinations and try these out. This may seem like a really odd thing to do. Why would you do this and why does it work?

3. Why does this work?

An important paper by Yoshua Bengio and James Bergstra outlines that reason that this works can be explained with two fundamental principles: Firstly. Not every hyperparameter is as important (we hinted at this in the Grid Search Section) Secondly, a little trick of probability

4. A probability trick

Let's explain the probability trick. Let's say that we have this grid. It has 100 cells, so 100 different models. 10 different values each of two hyperparameters Let us say that these 5 models are the best, highlighted in green. How many models would we need to run with random search to have a 95% chance of getting one of the green squares?

5. A probability Trick

Let's consider how likely it is that we continue to completely miss the good models, if we randomly select hyperparameter combinations uniformly On our first trial we have 5% chance of getting one of these squares as it is 5 squares out of 100. Therefore we have (1 minus 0-point-05) chance of missing these squares. If we do a second trial, we now have (1 minus 0-point-05) times (1 minus 0-point-05) of missing that range. For a third trial we have (1 minus 0-point-05) times (1 minus 0-point-05) times (1 minus 0-point-05) chance of missing that range. In fact, with n trials we have 1 minus 0-point-05 to the power n chance that every single trial misses all the good models.

6. A probability trick

So how many trials to have a high chance of being in the region? We know that the probability of missing everything is (1-0.05)^n So the probability of getting something in that area must be 1-(miss everything) which is 1-(1-0.05)^n. Without going into too much details, we can solve to get the answer as n >= 59

7. A probability trick

So what does that all mean? With relatively few trials we can get close to our maximum score with a relatively high probability. In essence, it is very unlikely that you will continue to miss everything for a long time A Grid Search may spend lots of time covering a bad area.

8. Some important notes

There are some important things to keep in mind with random search. Your possible score is still only as good as the grid you set! If you a bad grid to sample from, you will not get a great model Remember to fairly compare this to grid search, you need to have the same modeling 'budget'. For example, 200 models on grid search and 200 models in random search.

9. Creating a random sample of hyperparameters

We can create our own random sample of hyperparameter combinations. We firstly set the hyperparameter lists as you have done before with numpy linspace and a range function. You can then create a single list of hyperparameter combinations that we can sample from using itertools product function. Then we can randomly select 100 models from these lists using NumPy's handy random choice function, which gives us 100 random indexes we can use to index into the created list in the last line.

10. Visualizing a Random Search

A visual inspection of the hyperparameter values chosen by random search is a nice way to demonstrate how it works. Notice how the coverage of this is very wide but it does not cover thoroughly?

11. Let's practice!

Let's practice sampling and visualizing our own random search!