Wrap-up and remarks

1. Wrap-up and remarks

You have now learned how to build a logistic regression model and evaluate the result through the choice of a cutoff. Before finishing this chapter, I'd like to make two additional remarks.

2. Best cut-off for accuracy?

The first remark relates to the choice of an "optimal" cutoff value, and the second is a more general remark on logistic regression models. Let's have a look at how classification accuracy changes with different cutoff values for our full logistic regression model.

3. Best cut-off for accuracy?

There is a steep increase of the accuracy until a cutoff of around 25%,

4. Best cut-off for accuracy?

after that, there is a slight increase until a cutoff of 51%, and

5. Best cut-off for accuracy?

for all cutoffs greater than 51%, the accuracy doesn't change any more. It is important to stress that the increasing nature of accuracy as the cutoff increases is a very typical result for credit risk modeling, or in general, any kind of logistic regression model with unbalanced groups (way more ones than zeros, or the other way around). Looking at the accuracy alone, you would be tempted to use a cutoff value of over 51% here, as it leads to the best accuracy.

6. Best cut-off for accuracy?

As shown before, however, the high accuracy is only due to the fact that for a cutoff greater than 51%, all cases are classified as non-defaults.

7. What about sensitivity or specificity?

Now let's have a look at sensitivity and specificity. Unlike accuracy, the strictly increasing nature for specificity and strictly decreasing nature for sensitivity apply in general.

8. What about sensitivity or specificity?

Taking a cutoff of 0, all cases will be classified as defaults, leading to a sensitivity of 100%, but a specificity of 0.

9. What about sensitivity or specificity?

At the other extreme, taking a cutoff of 1, all cases will be classified as non-defaults, leading to a specificity of 100% but a sensitivity of 0. This trade-off between sensitivity and specificity always exists. We'll get back to the specification of a cutoff in the fourth chapter of this course.

10. About logistic regression…

To finalize this chapter, I would like to mention that the logistic regression model we've used up to now is also known as the logistic regression model with a logit link, which is the default in R, but can also be written using link equal to "logit". Using the expression at the bottom of the slide, you can then compute the probability of default.

11. Other logistic regression models

Alternatives exist, such as the probit and cloglog link functions. Although I won't discuss these variations in detail, they're worth mentioning. As shown here, negative parameter estimates for these models also lead to a decrease in default probability, and positive parameter estimates to an increase in default probability. However, the function describing the relationship between the parameter estimates and the actual probabilities of default changes, and is slightly more complex. Nonetheless, predictions are still very easy to obtain using R.

12. Let's practice!

Now, let's finish this chapter with some exercises!