Tune random forest hyperparameters
As with all models, we want to optimize performance by tuning hyperparameters. We have many hyperparameters for random forests, but the most important is often the number of features we sample at each split, or max_features in RandomForestRegressor from the sklearn library. For models like random forests that have randomness built-in, we also want to set the random_state. This is set for our results to be reproducible.
Usually, we can use sklearn's GridSearchCV() method to search hyperparameters, but with a financial time series, we don't want to do cross-validation due to data mixing. We want to fit our models on the oldest data and evaluate on the newest data. So we'll use sklearn's ParameterGrid to create combinations of hyperparameters to search.
Este ejercicio forma parte del curso
Machine Learning for Finance in Python
Instrucciones del ejercicio
- Set the
n_estimatorshyperparameter to be a list with one value (200) in thegriddictionary. - Set the
max_featureshyperparameter to be a list containing 4 and 8 in thegriddictionary. - Fit the random forest regressor model (
rfr, already created for you) to thetrain_featuresandtrain_targetswith each combination of hyperparameters,g, in the loop. - Calculate R\(^2\) by using
rfr.score()ontest_featuresand append the result to thetest_scoreslist.
Ejercicio interactivo práctico
Prueba este ejercicio y completa el código de muestra.
from sklearn.model_selection import ParameterGrid
# Create a dictionary of hyperparameters to search
grid = {____, 'max_depth': [3], 'max_features': ____, 'random_state': [42]}
test_scores = []
# Loop through the parameter grid, set the hyperparameters, and save the scores
for g in ParameterGrid(grid):
rfr.set_params(**g) # ** is "unpacking" the dictionary
rfr.fit(____, ____)
test_scores.append(rfr.score(____, ____))
# Find best hyperparameters from the test score and print
best_idx = np.argmax(test_scores)
print(test_scores[best_idx], ParameterGrid(grid)[best_idx])