Comparing link functions for a given cut-off

In this last exercise, you will fit a model using each of the three link functions (logit, probit and cloglog), make predictions for the test set, classify the predictions in the appropriate group (default versus non-default) for a given cut-off, make a confusion matrix and compute the accuracy and sensitivity for each of the models given the cut-off value! Wow, you've learned a lot so far. And finally, you will try to identify the model that performs best in terms of accuracy given the cut-off value!

It is important to know that the differences between the models will generally be very small, and again, the results will depend on the chosen cut-off value. The observed outcome (default versus non-default) is stored in true_val in the console.

This exercise is part of the course

Credit Risk Modeling in R

View Course

Exercise instructions

  • Fit three logistic regression models using links logit, probit and cloglog respectively. Part of the code is given. Use age, emp_cat, ir_cat and loan_amnt as predictors.
  • Make predictions for all models using the test_set.
  • Use a cut-off value of 14% to make predictions for each of the models, such that their performance can be evaluated.
  • Make a confusion matrix for the three models.
  • Lastly, compute the classification accuracy for all three models.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Fit the logit, probit and cloglog-link logistic regression models
log_model_logit <- glm(loan_status ~ age + emp_cat + ir_cat + loan_amnt,
                       family = binomial(link = logit), data = training_set)
log_model_probit <- 

log_model_cloglog <-  
  
# Make predictions for all models using the test set
predictions_logit <- predict(log_model_logit, newdata = test_set, type = "response")
predictions_probit <- 
predictions_cloglog <- 
  
# Use a cut-off of 14% to make binary predictions-vectors
cutoff <- 0.14
class_pred_logit <- ifelse(predictions_logit > cutoff, 1, 0)
class_pred_probit <- 
class_pred_cloglog <- 
  
# Make a confusion matrix for the three models
tab_class_logit <- table(true_val,class_pred_logit)
tab_class_probit <- 
tab_class_cloglog <- 
  
# Compute the classification accuracy for all three models
acc_logit <- sum(diag(tab_class_logit)) / nrow(test_set)
acc_probit <- 
acc_cloglog <-