Session Ready
Exercise

OOB vs CV-based early stopping

In the previous exercise, we used OOB error and cross-validated error to estimate the optimal number of trees in the GBM. These are two different ways to estimate the optimal number of trees, so in this exercise we will compare the performance of the models on a test set. We can use the same model object to make both of these estimates since the predict.gbm() function allows you to use any subset of the total number of trees (in our case, the total number is 10,000).

Instructions
100 XP

The ntree_opt_oob and ntree_opt_cv objects from the previous exercise (each storing an "optimal" value for n.trees) are loaded in the workspace.

Using the credit_model loaded in the workspace, generate two sets of predictions:

  • One using the OOB estimate of n.trees: 3,233 (stored in ntree_opt_oob)
  • And the other using the CV estimate of n.trees: 7,889 (stored in ntree_opt_cv)