Tune random forest hyperparameters
As with all models, we want to optimize performance by tuning hyperparameters. We have many hyperparameters for random forests, but the most important is often the number of features we sample at each split, or max_features
in RandomForestRegressor
from the sklearn library. For models like random forests that have randomness built-in, we also want to set the random_state
. This is set for our results to be reproducible.
Usually, we can use sklearn's GridSearchCV()
method to search hyperparameters, but with a financial time series, we don't want to do cross-validation due to data mixing. We want to fit our models on the oldest data and evaluate on the newest data. So we'll use sklearn's ParameterGrid
to create combinations of hyperparameters to search.
This exercise is part of the course
Machine Learning for Finance in Python
Exercise instructions
- Set the
n_estimators
hyperparameter to be a list with one value (200) in thegrid
dictionary. - Set the
max_features
hyperparameter to be a list containing 4 and 8 in thegrid
dictionary. - Fit the random forest regressor model (
rfr
, already created for you) to thetrain_features
andtrain_targets
with each combination of hyperparameters,g
, in the loop. - Calculate R\(^2\) by using
rfr.score()
ontest_features
and append the result to thetest_scores
list.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
from sklearn.model_selection import ParameterGrid
# Create a dictionary of hyperparameters to search
grid = {____, 'max_depth': [3], 'max_features': ____, 'random_state': [42]}
test_scores = []
# Loop through the parameter grid, set the hyperparameters, and save the scores
for g in ParameterGrid(grid):
rfr.set_params(**g) # ** is "unpacking" the dictionary
rfr.fit(____, ____)
test_scores.append(rfr.score(____, ____))
# Find best hyperparameters from the test score and print
best_idx = np.argmax(test_scores)
print(test_scores[best_idx], ParameterGrid(grid)[best_idx])