Evaluating model performance

1. Evaluating model performance

In the last lesson, you learned to build a predictive model using the training data. To know whether the model is any good and to be able to compare various models, we need to measure their performance. To do this, you first apply the model to the test data to obtain the predictions for the observations. Then, by comparing the predictions to the actual labels, you can see how good the model is at predicting the target and thus evaluate the model performance. We consider two commonly used performance measures for churn prediction: AUC and top decile lift.

2. Making predictions

Now that we have built a model, we can use it to make predictions on new data. For example, we can apply the model to Cecilia's attributes and infer which programming language she prefers. In `R` we use the `predict` function in the `pROC` package, with the model and the test set as arguments. Note that there is a slight difference for the third argument depending on whether the model is a logistic regression model or a random forest model. In logistic regression, type should equal `response` and the result is a vector with probabilities of being 1, in this case preferring R. For random forests, we write `type=prob` to get the probabilities. The result is a matrix with two columns. The first one is the probability of being 0 and the other the probability of being 1. You can see the random forest prediction for Cecelia in the R output. It seems that she has a higher probability of preferring R, at 86.4%. Note that usually, the test set contains more than just one observation.

3. AUC

The area under the receiver operating characteristic curve, or AUC for short, represents that the predicted probability of a randomly chosen churner, or R user, is higher than that of a randomly chosen non-churner, or Python user. AUC is typically a number between 0.5 and 1 and encapsulates the trade-off between the true and false positive rates. It is easy to compute in R. As in the previous slide, let the variable `logPredictions` be the predictions of a logistic regression model when applied to the test set. Then we use the `auc` function in the `pROC` package. The function has two arguments, the true labels of the test set, denoted here with `test_set$label` and the predictions. The result is a single number, between 0.5 and 1. 0.5 is the performance of a random model and 1 the performance of a perfect model, so a higher value means a better model.

4. Top decile lift

Top decile lift, or simply lift, is another important performance measure for churn prediction. It computes the proportion of actual churners amongst the 10% of customers with the highest predicted churn probability. Thereby, it represents how much better a prediction model is at identifying churners, compared to a random sample of customers. A random model has lift equal to one, and any model that is better than a random model thus has a higher lift To compute the lift, start by sorting the predictions of the observations from highest to lowest score. Amongst the 10% of customers with the highest predicted probability of churn, find the ratio of actual churners, and divide with the churn rate of the whole dataset. Suppose in the top 10% of the highest scores, 60% are churners. If, in the whole population, 10% are churners, then the lift is 60/10=6. In `R` you can use the function `TopDecileLift` in the package `lift` to compute the top decile lift, as you see here.

5. Let's practice!

Now it is time for you to compare the performance of the churn prediction models.

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.