Finding the right cut-off: the strategy curve

1. Finding the right cut-off: the strategy curve

In the previous chapters, we gave you an overview of how logistic regression models and decision trees can be used to compute the probability of default.

2. Constructing a confusion matrix

Using the function predict() on a logistic regression model with type = response, you get a vector with the predicted probabilities of default. Using the predict() function like this on a tree model, you will get a matrix with two columns: the numbers in the left-hand column represent the probability of non-default, and the numbers in the right-hand column represent the probabilities of default. Select the second column to make them comparable to the predictions from a logistic regression model.

3. Cut-off?

Until now, we have set cutoffs based on best guesses or our gut feeling in order to make accuracy matrices for each of the models used. The choice of a cutoff is important, however, as this changes the validation metrics. Additionally, using these models on future applicants that come in, the cutoff could be a way of deciding which loan applicants will get a loan and which applicants will not. It is clear that these models are never perfect, and no matter how many applicants a bank refuses, there will still be debtors that default. The good thing is that the models can help a bank decide how many loans they should approve if they don't want to exceed a certain percentage of defaults in their portfolio of customers. Let's see how this works. We'll start with a simple example using the full logit regression model from before. Let's assume the test set contains new applicants, and a bank decided to reject 20% of these new applicants based on their fitted probability of default.

4. A certain strategy

This means that the 20% with the highest predicted probability of default will be rejected. To obtain the cutoff value that would lead to a predicted "1" (or default) for 20% of the cases in the test set, you look at the 80% quantile of the predictions vector. Having used this cutoff, you know which test set loan applicants would have been rejected using an 80% acceptance rate.

5. A certain strategy

We actually want to have a look now at the true status of the loans that would have been accepted using this cutoff, and see what percentage in this set of accepted loans actually defaulted (which is also referred to as the "bad rate"). Having accepted 80% of the loans, we see that the bad rate is equal to 8-point-97%.

6. The strategy table

If you repeat this for several acceptance rates, you can construct an entire table with acceptance rates, corresponding cutoff values, and the percentage of "bad loans" that are accepted. This is a useful tool for a bank because banks can adapt their acceptance rate depending on the percentage of bad loans they can allow in their loan portfolio. Suppose a bank can only allow an 8% bad rate. Using this model, a bank will decide to accept 65% of the loans.

7. The strategy curve

This so-called strategy curve provides you with a visual tool as well, showing the bad rates as a function of the acceptance rate visually.

8. Let's practice!

Now, let's put what you've learned into practice!