Evaluating the logistic regression model result

1. Evaluating the logistic regression model result

Now that you've learned how to predict probabilities of default, it's time to evaluate the result.

2. Recap: model evaluation

Using the predicted probabilities, you would like to make a confusion matrix. As you saw in the first chapter, you can make a confusion matrix comparing the loan_status column in the test set with model predictions for loan status.

3. In reality...

The problem is that a predicted probability of default lies somewhere between zero and one. How will we make a confusion matrix now?

4. In reality...

The answer lies in the specification of a threshold value or a cutoff. We will need to determine a value between 0 and 1, and if the predicted value lies above this value, the prediction is set to 1; if not, the prediction is set to 0.

5. Cutoff = 0.5

A logical cutoff value might be point-5: if the chances are more in favor of default, or bigger than point-5, the prediction is set to 1, if not, it is set to zero.

6. Cutoff = 0.5

Let's have a look at the confusion matrix for the subset of these 14 cases. Looking at this small subset, it becomes clear that this cutoff will lead to no predicted defaults at all, giving an all-zero column here. Thus we have a sensitivity of 0%, with none of the actual defaults correctly classified. As you have seen in the previous exercises, it is not uncommon to have rather low predicted probabilities of default, as loan default is a rare event. Therefore, care must be taken when deciding on a cutoff value, not simply setting it to point-5.

7. Cutoff = 0.1

Let's have a look at the confusion matrix for a cutoff point-1. This result leads to a significantly better sensitivity, whereas the classification accuracy doesn't change.

8. Let's practice!

Let's experiment with some cutoff values!