Get startedGet started for free

Grid and random search with h2o

1. Grid and random search with H2O

As caret and mlr, h2o supports Cartesian grid search and random search.

2. Hyperparameters in H2O models

We can find an overview of all hyperparameters in the help for each model function. A few of the hyperparameters for gradient boosting models are - the number of trees - maximum tree depth - fewest allowed observations in a leaf and - learning rate, optionally with scaling

3. Preparing our data for modeling with H2O

Before we define hyperparameter grids, let's briefly go over how we prepared our data: - we converted the data to an h2o frame - defined features and target and - split into training, validation & test data

4. Defining a hyperparameter grid

For cartesian grid search, we define a list of values for the hyperparameters, we want to tune. This list, can be fed into the h2o(dot)grid function as input to the hyper_params argument. Additional arguments to give h2o(doc)grid can be - the algorithm, here gbm - a model id, - features and target - training and validation data - a seed for random number generation h2o will now train a model for every possible combination of hyperparameters from our grid. After training is complete, we can look at the tuning results with the h2o(dot)getGrid function.

5. Examining a grid object

To examine the tuning results, h2o(dot)getGrid uses the grid id which we defined in the h2o.grid function before. We also give a metric and the order by which to sort the different models. The summary of this grid object tells us which hyperparameters were tuned, how many models were trained (and how many failed).

6. Extracting the best model from a grid

All 27 models in our grid are given a unique model id, which we can use to extract any of the models with the h2o(dot)getModel function. Usually, we will want to extract the best model. Because we sorted by decreasing accuracy, the first model in the sorted table returned by get.grid will have the highest accuracy. Model summary will give us an overview of the hyperparameters used.

7. Extracting the best model from a grid

The extracted best model can now be treated as a regular h2o model. We can for example use the h2o(dot)performance function to evaluate the performance on test data.

8. Random search with H2O

Random search in h2o also takes a list of hyperparameter values, just as the grid search example before. In contrast, we now don't want to train models for every possible combination of our defined grid, but only for a randomly sampled subset of our hyperparameters. Thus, we need an additional list that defines the search criterium "random discrete". We have several options for controlling how the random search is performed - set maximum number of models to train with max_models - define stopping metric, rounds and tolerance - or set the maximum run time, as here with 60 seconds This search criteria object can then be passed onto the search_criteria argument of h2o(dot)grid.

9. Stopping criteria

Instead of defining the maximum run time or the maximum number of model to train during random search, we can define early stopping criteria. Early stopping is calculated on the validation data and works with three arguments: - stopping_metric - stopping_rounds and - stopping_tolerance. If the performance metric doesn't improve for a number of consecutive rounds by at least the stopping tolerance, model training will stop. In this example here, training will stop if 6 times in a row, the mean per class error does not improve by at least 0.0001. For stopping metric, we can choose between several options, like - mean residual deviance (the default for regression tasks) - logloss (the default for classification tasks) - mean squared error MSE, etc. The remaining training code and model output looks just like before.

10. Time to practice!

Time to practice!