Movie revenue prediction with CatBoost
Let's finish up this chapter on boosting by returning to the movies
dataset! In this exercise, you'll build a CatBoostRegressor
to predict the log-revenue. Remember that our best model so far is the AdaBoost model with a RMSE of 5.15
.
Will CatBoost beat AdaBoost? We'll try to use a similar set of parameters to have a fair comparison.
Recall that these are the features we have used so far: 'budget'
, 'popularity'
, 'runtime'
, 'vote_average'
, and 'vote_count'
. catboost
has been imported for you as cb
.
OBS: be careful not to use a classifier, or your session might expire!
This exercise is part of the course
Ensemble Methods in Python
Exercise instructions
- Build and fit a
CatBoostRegressor
using100
estimators, a learning rate of0.1
, and a max depth of3
. - Calculate the predictions for the test set and print the RMSE.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
import catboost as cb
# Build and fit a CatBoost regressor
reg_cat = ____.____(____, ____, ____, random_state=500)
____
# Calculate the predictions on the test set
pred = ____
# Evaluate the performance using the RMSE
rmse_cat = np.sqrt(mean_squared_error(y_test, pred))
print('RMSE (CatBoost): {:.3f}'.format(rmse_cat))