Evaluate performance
Lastly, and as always, we want to evaluate performance of our best model to check how well or poorly we are doing. Ideally it's best to do back-testing, but that's an involved process we don't have room to cover in this course.
We've already seen the R\(^2\) scores, but let's take a look at the scatter plot of predictions vs actual results using matplotlib
. Perfect predictions would be a diagonal line from the lower left to the upper right.
This exercise is part of the course
Machine Learning for Finance in Python
Exercise instructions
- Use the best number for
max_features
in ourRandomForestRegressor
(rfr
) that we found in the previous exercise (it was 4). - Make predictions using the model with the
train_features
andtest_features
. - Scatter actual targets (train/test_targets) vs the predictions (train/test_predictions), and label the datasets
train
andtest
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Use the best hyperparameters from before to fit a random forest model
rfr = RandomForestRegressor(n_estimators=200, max_depth=3, max_features=____, random_state=42)
rfr.fit(train_features, train_targets)
# Make predictions with our model
train_predictions = rfr.predict(____)
test_predictions = ____
# Create a scatter plot with train and test actual vs predictions
plt.scatter(train_targets, train_predictions, label='train')
plt.scatter(____)
plt.legend()
plt.show()