Further model reduction?

Deleting the variable loan_amnt, the AUC can be further improved to 0.6548! The resulting model is

log_4_remove_amnt <- glm(loan_status ~ grade + annual_inc + emp_cat, family = binomial, data = training_set) 

Is it possible to reduce the logistic regression model to only two variable without reducing the AUC? In this exercise you will see if it is possible!

This exercise is part of the course

Credit Risk Modeling in R

View Course

Exercise instructions

  • Again, delete one variable at a time in the model log_4_remove_amnt, Remember that you should be using the default link function (logit).
  • Use predict() to make probability of default predictions for each model you created.
  • Obtain the AUC's for each of the three models, using test_set$loan_status as a first argument and the predictions for each of the three models as a second argument.
  • Plot the ROC-curve for the model with the highest AUC in your workspace, using plot(roc()) where the content of roc() is the same as the content for the function auc() with the highest AUC. Note that it is possible that the AUC cannot be reduced anymore with respect to model log_4_remove_amnt. The predictions for this model are loaded in your workspace as pred_4_remove_amnt, in the case that this model leads to the highest AUC.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Build three models each time deleting one variable in log_4_remove_amnt
log_5_remove_grade <- glm(loan_status ~ annual_inc + emp_cat, family = binomial, data = training_set) 
log_5_remove_inc <- 
log_5_remove_emp <- 

# Make PD-predictions for each of the models
pred_5_remove_grade <- predict(log_5_remove_grade, newdata = test_set, type = "response")
pred_5_remove_inc <-
pred_5_remove_emp <-

# Compute the AUCs



# Plot the ROC-curve for the best model here