1. Learn
  2. /
  3. Courses
  4. /
  5. Credit Risk Modeling in R

Exercise

Another round of pruning based on AUC

In the video you saw how the "full" logistic regression model with a logit link was being pruned based on the AUC. You saw how the variable home_ownership was deleted from the model, as it improved the overall AUC. After repeating this process for two additional rounds, the variables age and ir_cat were deleted, leading to the model:

log_3_remove_ir <- glm(loan_status ~ loan_amnt + grade + annual_inc + emp_cat, family = binomial, data = training_set)

with an AUC of 0.6545. Now, it's your turn to see whether the AUC can still be improved by deleting another variable from the model.

Instructions

100 XP
  • Delete one variable at a time in the model log_3_remove_ir, Remember that you should be using the default link function (logit).
  • Make probability of default-predictions for each of the models you created.
  • Use function auc() using test_set$loan_status as a first argument and the predictions for each of the four models as a second argument to obtain the AUC's for each model.
  • Copy the name of the object (as given in the first question of this exercise) that represents the model with the best AUC.