Session Ready
Exercise

Train a GBM model

Here you will use the gbm() function to train a GBM classifier to predict loan default. You will train a 10,000-tree GBM on the credit_train dataset, which is pre-loaded into your workspace.

Using such a large number of trees (10,000) is probably not optimal for a GBM model, but we will build more trees than we need and then select the optimal number of trees based on early performance-based stopping. The best GBM model will likely contain fewer trees than we started with.

For binary classification, gbm() requires the response to be encoded as 0/1 (numeric), so we will have to convert from a "no/yes" factor to a 0/1 numeric response column.

Also, the the gbm() function requires the user to specify a distribution argument. For a binary classification problem, you should set distribution = "bernoulli". The Bernoulli distribution models a binary response.

Instructions
100 XP
  • Convert from a "no/yes" factor to a 0/1 numeric response column using the ifelse() function.
  • Train a 10,000-tree GBM model.