CommencerCommencer gratuitement

Another round of pruning based on AUC

In the video you saw how the "full" logistic regression model with a logit link was being pruned based on the AUC. You saw how the variable home_ownership was deleted from the model, as it improved the overall AUC. After repeating this process for two additional rounds, the variables age and ir_cat were deleted, leading to the model:

log_3_remove_ir <- glm(loan_status ~ loan_amnt + grade + annual_inc + emp_cat, family = binomial, data = training_set)

with an AUC of 0.6545. Now, it's your turn to see whether the AUC can still be improved by deleting another variable from the model.

Cet exercice fait partie du cours

Credit Risk Modeling in R

Afficher le cours

Instructions

  • Delete one variable at a time in the model log_3_remove_ir, Remember that you should be using the default link function (logit).
  • Make probability of default-predictions for each of the models you created.
  • Use function auc() using test_set$loan_status as a first argument and the predictions for each of the four models as a second argument to obtain the AUC's for each model.
  • Copy the name of the object (as given in the first question of this exercise) that represents the model with the best AUC.

Exercice interactif pratique

Essayez cet exercice en complétant cet exemple de code.

# Build four models each time deleting one variable in log_3_remove_ir
log_4_remove_amnt <- glm(loan_status ~ grade + annual_inc + emp_cat, 
                        family = binomial, data = training_set) 
log_4_remove_grade <-
log_4_remove_inc <- 
log_4_remove_emp <-

# Make PD-predictions for each of the models
pred_4_remove_amnt <- predict(log_4_remove_amnt, newdata = test_set, type = "response")
pred_4_remove_grade <- 
pred_4_remove_inc <-
pred_4_remove_emp <- 

# Compute the AUCs

  
  
Modifier et exécuter le code