Session Ready
Exercise

Early stopping in GBMs

Use the gbm.perf() function to estimate the optimal number of boosting iterations (aka n.trees) for a GBM model object using both OOB and CV error. When you set out to train a large number of trees in a GBM (such as 10,000) and you use a validation method to determine an earlier (smaller) number of trees, then that's called "early stopping". The term "early stopping" is not unique to GBMs, but can describe auto-tuning the number of iterations in an iterative learning algorithm.

Instructions
100 XP
  • The credit_model object is loaded in the workspace.
  • Use the gbm.perf() function with the "OOB" method to get the optimal number of trees based on the OOB error and store that number as ntree_opt_oob.
  • Train a new GBM model, this time with cross-validation, so we can get a cross-validated estimate of the optimal number of trees.
  • Lastly, use the gbm.perf() function with the "cv" method to get the optimal number of trees based on the CV error and store that number as ntree_opt_cv.
  • Compare the two numbers.