Further model reduction?
Deleting the variable loan_amnt
, the AUC can be further improved to 0.6548! The resulting model is
log_4_remove_amnt <- glm(loan_status ~ grade + annual_inc + emp_cat, family = binomial, data = training_set)
Is it possible to reduce the logistic regression model to only two variable without reducing the AUC? In this exercise you will see if it is possible!
This exercise is part of the course
Credit Risk Modeling in R
Exercise instructions
- Again, delete one variable at a time in the model
log_4_remove_amnt
, Remember that you should be using the default link function (logit). - Use
predict()
to make probability of default predictions for each model you created. - Obtain the AUC's for each of the three models, using
test_set$loan_status
as a first argument and the predictions for each of the three models as a second argument. - Plot the ROC-curve for the model with the highest AUC in your workspace, using
plot(roc())
where the content ofroc()
is the same as the content for the functionauc()
with the highest AUC. Note that it is possible that the AUC cannot be reduced anymore with respect to modellog_4_remove_amnt
. The predictions for this model are loaded in your workspace aspred_4_remove_amnt
, in the case that this model leads to the highest AUC.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Build three models each time deleting one variable in log_4_remove_amnt
log_5_remove_grade <- glm(loan_status ~ annual_inc + emp_cat, family = binomial, data = training_set)
log_5_remove_inc <-
log_5_remove_emp <-
# Make PD-predictions for each of the models
pred_5_remove_grade <- predict(log_5_remove_grade, newdata = test_set, type = "response")
pred_5_remove_inc <-
pred_5_remove_emp <-
# Compute the AUCs
# Plot the ROC-curve for the best model here