Comparing link functions for a given cut-off
In this last exercise, you will fit a model using each of the three link functions (logit, probit and cloglog), make predictions for the test set, classify the predictions in the appropriate group (default versus non-default) for a given cut-off, make a confusion matrix and compute the accuracy and sensitivity for each of the models given the cut-off value! Wow, you've learned a lot so far. And finally, you will try to identify the model that performs best in terms of accuracy given the cut-off value!
It is important to know that the differences between the models will generally be very small, and again, the results will depend on the chosen cut-off value. The observed outcome (default versus non-default) is stored in true_val
in the console.
This exercise is part of the course
Credit Risk Modeling in R
Exercise instructions
- Fit three logistic regression models using links
logit
,probit
andcloglog
respectively. Part of the code is given. Useage
,emp_cat
,ir_cat
andloan_amnt
as predictors. - Make predictions for all models using the
test_set
. - Use a cut-off value of 14% to make predictions for each of the models, such that their performance can be evaluated.
- Make a confusion matrix for the three models.
- Lastly, compute the classification accuracy for all three models.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Fit the logit, probit and cloglog-link logistic regression models
log_model_logit <- glm(loan_status ~ age + emp_cat + ir_cat + loan_amnt,
family = binomial(link = logit), data = training_set)
log_model_probit <-
log_model_cloglog <-
# Make predictions for all models using the test set
predictions_logit <- predict(log_model_logit, newdata = test_set, type = "response")
predictions_probit <-
predictions_cloglog <-
# Use a cut-off of 14% to make binary predictions-vectors
cutoff <- 0.14
class_pred_logit <- ifelse(predictions_logit > cutoff, 1, 0)
class_pred_probit <-
class_pred_cloglog <-
# Make a confusion matrix for the three models
tab_class_logit <- table(true_val,class_pred_logit)
tab_class_probit <-
tab_class_cloglog <-
# Compute the classification accuracy for all three models
acc_logit <- sum(diag(tab_class_logit)) / nrow(test_set)
acc_probit <-
acc_cloglog <-