Session Ready
Exercise

Tune random forest hyperparameters

As with all models, we want to optimize performance by tuning hyperparameters. We have many hyperparameters for random forests, but the most important is often the number of features we sample at each split, or max_features in RandomForestRegressor from the sklearn library. For models like random forests that have randomness built-in, we also want to set the random_state. This is set for our results to be reproducible.

Usually, we can use sklearn's GridSearchCV() method to search hyperparameters, but with a financial time series, we don't want to do cross-validation due to data mixing. We want to fit our models on the oldest data and evaluate on the newest data. So we'll use sklearn's ParameterGrid to create combinations of hyperparameters to search.

Instructions
100 XP
  • Set the n_estimators hyperparameter to be a list with one value (200) in the grid dictionary.
  • Set the max_features hyperparameter to be a list containing 4 and 8 in the grid dictionary.
  • Fit the random forest regressor model (rfr, already created for you) to the train_features and train_targets with each combination of hyperparameters, g, in the loop.
  • Calculate R\(^2\) by using rfr.score() on test_features and append the result to the test_scores list.