Finding the optimal threshold

Imagine you are running a campaign with the aim of preventing customers to default. You can lay out your campaign with the help of your predictions. Thereby, the choice of the threshold is essential for your results. If you know the costs and the rewards of your campaign, you can empirically check which threshold is most reasonable. In this exercise, we are faced with the following scenario:

If a customer does not default due to our campaign, i.e. if we predicted the default correctly (true positive) we are rewarded with 1000€. If however we aim our campaign at a customer who would not have defaulted anyways, i.e. if we falsely predicted the customer (false positive) to default, we are faced with costs of 250€.

From the last exercise we know that the restricted model was the best one. So only calculate the optimal threshold for that model. The predictions are stored in the column predNew of the defaultData dataframe. Use the SDMTools package.

To practice, construct a confusion matrix with a threshold of 0.5. Look at the matrix and recall where you can find the true positives and the false positives.

Modeling Customer Lifetime Value with Linear Regression

Logistic Regression for Churn Prevention

Modeling Time to Reorder with Survival Analysis

Reducing Dimensionality with Principal Component Analysis

Exercise

Finding the optimal threshold

Instructions 1/3