Another round of pruning based on AUC
In the video you saw how the "full" logistic regression model with a logit link was being pruned based on the AUC. You saw how the variable home_ownership
was deleted from the model, as it improved the overall AUC. After repeating this process for two additional rounds, the variables age
and ir_cat
were deleted, leading to the model:
log_3_remove_ir <- glm(loan_status ~ loan_amnt + grade + annual_inc + emp_cat, family = binomial, data = training_set)
with an AUC of 0.6545. Now, it's your turn to see whether the AUC can still be improved by deleting another variable from the model.
This exercise is part of the course
Credit Risk Modeling in R
Exercise instructions
- Delete one variable at a time in the model
log_3_remove_ir
, Remember that you should be using the default link function (logit). - Make probability of default-predictions for each of the models you created.
- Use function
auc()
usingtest_set$loan_status
as a first argument and the predictions for each of the four models as a second argument to obtain the AUC's for each model. - Copy the name of the object (as given in the first question of this exercise) that represents the model with the best AUC.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Build four models each time deleting one variable in log_3_remove_ir
log_4_remove_amnt <- glm(loan_status ~ grade + annual_inc + emp_cat,
family = binomial, data = training_set)
log_4_remove_grade <-
log_4_remove_inc <-
log_4_remove_emp <-
# Make PD-predictions for each of the models
pred_4_remove_amnt <- predict(log_4_remove_amnt, newdata = test_set, type = "response")
pred_4_remove_grade <-
pred_4_remove_inc <-
pred_4_remove_emp <-
# Compute the AUCs