1. The cumulative gains curve
Once your model is ready, you are eager to show it to business and show off with its performance. Next, we will introduce visualizations of model performance that are relevant for business.
2. Evaluation curves
Until now, we evaluated models using the AUC. Although this measure is very useful for data scientists, it is less appropriate if you want to discuss your models with business stakeholders. Indeed, the AUC is a rather complex evaluation measure that is not that intuitive. Moreover, it is a single number, that does not catch all the information about the model.
Instead, one can use evaluation curves like the cumulative gains curve. This type of curve is easy to explain and can guide you to better business decisions.
3. Cumulative gains construction
The cumulative gains curve is constructed as follows. First you order all the observations according to the output of the model. On the left hand side are observations with the highest probability to be target according to the model, on the right hand side are observations with the lowest probability to be target.
On the horizontal axis of the cumulative gains curve, it is indicated which percentage of the observations is considered. For instance, at 30%, the 30% observations with the highest probability to be target is considered.
On the vertical axis, the curve indicates which percentage of all targets is included in this group. For instance, if the cumulative gains is 70% at 30%, it means that when taking the top 30% observations with highest probability to be target, this group contains already 70% of all targets.
4. Cumulative gains interpretation
The cumulative gains curve is a great tool to compare models. The more the line is situated to the upper left corner, the better the model.
It is often the case that two models produce curves that cross each other. In that case, it is not straightforward to decide which model is best. In this case, for instance, you could say that model 2 is better to distinguish the top 10% observations from the rest, while model 1 is better to distinguish the top 70% of the observations from the rest.
5. Cumulative gains in Python
Constructing cumulative gains curves in Python has been made easy with the scikitplot module.
First, you need to import this module, together with the matplotlib pyplot module. then, you can use the method plot_cumulative_gain, which has two arguments. The first argument is an array with the true values of the target, the second argument has the predictions for the observations resulting from the model. Note that predictions should have both the predictions for the target to be 1 as well as the target to be 0.
6. Cumulative gains in Python
The output of this piece of code is given here. Observe that there are two curves, the one that we are interested in is the curve for class one, as we want to predict targets.
7. Let's practice!
Are you ready make your cumulative gains curve?