ROC-curves for comparison of logistic regression models
ROC-curves can easily be created using the pROC-package in R. Let's have a look if there is a big difference between ROC-curves for the four logistic regression-models previously used throughout this course. A small heads up:
predictions_logit
contains probability of default (PD) predictions using the default logit link and containing variablesage
,emp_cat
,ir_cat
andloan_amnt
.predictions_probit
contains PD-predictions using the probit and containing variablesage
,emp_cat
,ir_cat
andloan_amnt
.predictions_cloglog
contains PD-predictions using the cloglog link and containing variablesage
,emp_cat
,ir_cat
andloan_amnt
.predictions_all_full
contains PD-predictions using the default logit link and containing all seven variables in the data set.
You will first draw the ROC-curves for these four models in one plot. Afterwards, you will look at the area under the curve.
This exercise is part of the course
Credit Risk Modeling in R
Exercise instructions
- Load the pROC-package in your R-console.
- Construct the ROC-objects for the four logistic regression models using function
roc(response, predictor)
. Remember that the response is the loan status indicator in thetest_set
, which can be obtained throughtest_set$loan_status
. - Use the previously created objects to construct ROC-curves. To draw them all on one plot, use
plot()
for the first ROC-curve drawn (forROC_logit
), and use [lines()](https://www.rdocumentation.org/packages/graphics/functions/lines to add the ROC-curves) for the other three models to the same plot. - Use the
col
-argument to change the color of the curve ofROC_probit
to"blue"
,ROC_cloglog
to"red"
andROC_all_full
to"green"
. Note that, in contrast with what has been discussed in the video, the x-axis label is Specificity and not "1-Specificity", resulting in an axis that goes from 1 on the left-hand side to 0 on the right-hand side. - It seems that the link function does not have a big impact on the ROC here, and the main trigger of a better ROC is the inclusion of more variables in a model. To get an exact idea of the performance of the ROC-curves, have a look at the AUC's, using function auc().
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Load the pROC-package
# Construct the objects containing ROC-information
ROC_logit <- roc(test_set$loan_status, predictions_logit)
ROC_probit <-
ROC_cloglog <-
ROC_all_full <-
# Draw all ROCs on one plot
plot(___)
lines(___, col=___)
lines(___, col=___)
lines(___, col=___)
# Compute the AUCs