Get startedGet started for free

ROC-curves for comparison of logistic regression models

ROC-curves can easily be created using the pROC-package in R. Let's have a look if there is a big difference between ROC-curves for the four logistic regression-models previously used throughout this course. A small heads up:

  • predictions_logit contains probability of default (PD) predictions using the default logit link and containing variables age, emp_cat, ir_cat and loan_amnt.
  • predictions_probit contains PD-predictions using the probit and containing variables age, emp_cat, ir_cat and loan_amnt.
  • predictions_cloglog contains PD-predictions using the cloglog link and containing variables age, emp_cat, ir_cat and loan_amnt.
  • predictions_all_full contains PD-predictions using the default logit link and containing all seven variables in the data set.

You will first draw the ROC-curves for these four models in one plot. Afterwards, you will look at the area under the curve.

This exercise is part of the course

Credit Risk Modeling in R

View Course

Exercise instructions

  • Load the pROC-package in your R-console.
  • Construct the ROC-objects for the four logistic regression models using function roc(response, predictor). Remember that the response is the loan status indicator in the test_set, which can be obtained through test_set$loan_status.
  • Use the previously created objects to construct ROC-curves. To draw them all on one plot, use plot() for the first ROC-curve drawn (for ROC_logit), and use [lines()](https://www.rdocumentation.org/packages/graphics/functions/lines to add the ROC-curves) for the other three models to the same plot.
  • Use the col-argument to change the color of the curve of ROC_probit to "blue", ROC_cloglog to "red" and ROC_all_full to "green". Note that, in contrast with what has been discussed in the video, the x-axis label is Specificity and not "1-Specificity", resulting in an axis that goes from 1 on the left-hand side to 0 on the right-hand side.
  • It seems that the link function does not have a big impact on the ROC here, and the main trigger of a better ROC is the inclusion of more variables in a model. To get an exact idea of the performance of the ROC-curves, have a look at the AUC's, using function auc().

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Load the pROC-package


# Construct the objects containing ROC-information
ROC_logit <- roc(test_set$loan_status, predictions_logit)
ROC_probit <- 
ROC_cloglog <-
ROC_all_full <- 

# Draw all ROCs on one plot
plot(___)
lines(___, col=___)
lines(___, col=___)
lines(___, col=___)

# Compute the AUCs



Edit and Run Code