CommencerCommencer gratuitement

Further model reduction?

Deleting the variable loan_amnt, the AUC can be further improved to 0.6548! The resulting model is

log_4_remove_amnt <- glm(loan_status ~ grade + annual_inc + emp_cat, family = binomial, data = training_set) 

Is it possible to reduce the logistic regression model to only two variable without reducing the AUC? In this exercise you will see if it is possible!

Cet exercice fait partie du cours

Credit Risk Modeling in R

Afficher le cours

Instructions

  • Again, delete one variable at a time in the model log_4_remove_amnt, Remember that you should be using the default link function (logit).
  • Use predict() to make probability of default predictions for each model you created.
  • Obtain the AUC's for each of the three models, using test_set$loan_status as a first argument and the predictions for each of the three models as a second argument.
  • Plot the ROC-curve for the model with the highest AUC in your workspace, using plot(roc()) where the content of roc() is the same as the content for the function auc() with the highest AUC. Note that it is possible that the AUC cannot be reduced anymore with respect to model log_4_remove_amnt. The predictions for this model are loaded in your workspace as pred_4_remove_amnt, in the case that this model leads to the highest AUC.

Exercice interactif pratique

Essayez cet exercice en complétant cet exemple de code.

# Build three models each time deleting one variable in log_4_remove_amnt
log_5_remove_grade <- glm(loan_status ~ annual_inc + emp_cat, family = binomial, data = training_set) 
log_5_remove_inc <- 
log_5_remove_emp <- 

# Make PD-predictions for each of the models
pred_5_remove_grade <- predict(log_5_remove_grade, newdata = test_set, type = "response")
pred_5_remove_inc <-
pred_5_remove_emp <-

# Compute the AUCs



# Plot the ROC-curve for the best model here
Modifier et exécuter le code