ComenzarEmpieza gratis

Further model reduction?

Deleting the variable loan_amnt, the AUC can be further improved to 0.6548! The resulting model is

log_4_remove_amnt <- glm(loan_status ~ grade + annual_inc + emp_cat, family = binomial, data = training_set) 

Is it possible to reduce the logistic regression model to only two variable without reducing the AUC? In this exercise you will see if it is possible!

Este ejercicio forma parte del curso

Credit Risk Modeling in R

Ver curso

Instrucciones del ejercicio

  • Again, delete one variable at a time in the model log_4_remove_amnt, Remember that you should be using the default link function (logit).
  • Use predict() to make probability of default predictions for each model you created.
  • Obtain the AUC's for each of the three models, using test_set$loan_status as a first argument and the predictions for each of the three models as a second argument.
  • Plot the ROC-curve for the model with the highest AUC in your workspace, using plot(roc()) where the content of roc() is the same as the content for the function auc() with the highest AUC. Note that it is possible that the AUC cannot be reduced anymore with respect to model log_4_remove_amnt. The predictions for this model are loaded in your workspace as pred_4_remove_amnt, in the case that this model leads to the highest AUC.

Ejercicio interactivo práctico

Prueba este ejercicio completando el código de muestra.

# Build three models each time deleting one variable in log_4_remove_amnt
log_5_remove_grade <- glm(loan_status ~ annual_inc + emp_cat, family = binomial, data = training_set) 
log_5_remove_inc <- 
log_5_remove_emp <- 

# Make PD-predictions for each of the models
pred_5_remove_grade <- predict(log_5_remove_grade, newdata = test_set, type = "response")
pred_5_remove_inc <-
pred_5_remove_emp <-

# Compute the AUCs



# Plot the ROC-curve for the best model here
Editar y ejecutar código