1. Model performance tradeoffs
As the previous exercise illustrated, rare events create challenges for classification models. When one outcome is very rare, predicting the opposite can result in a very high accuracy. This was the case in the donations dataset; because only about 5% of people were donors, predicting non-donation resulted in an overall accuracy of 95% but an accuracy of zero on the outcome that mattered the most: the donations.
In cases like these, it may be necessary to to sacrifice a bit of overall accuracy in order to better target the outcome of interest.
2. Understanding ROC curves
A visualization called an ROC curve provides a way to better understand a model's ability to distinguish between positive and negative predictions the outcome of interest versus all others.
To understand how it works, imagine that you were working on a project where the positive outcome is 'X' and the negative outcome is 'O'. The classifier is trying to distinguish between the two. If the classifier is poor, the X's and O's will remain very mixed as shown here.
The ROC curve depicts the relationship between the percentage of positive examples as it relates to the percentage of the other outcomes. Here, because the X's and O's are even, the ROC curve makes a diagonal line showing that the proportion of interesting examples rises evenly with the proportion of negative examples.
On the other hand, suppose we have a machine learning model that is able to sort the examples of interest so they appear at the front of the dataset. The outcomes might be arranged as shown here, with more X's on the left than on the right.
When the ROC curve is drawn for this arrangement, it is no longer on the diagonal because the model is able to identify several positive examples for each negative example it accidentally prioritized.
The diagonal line is the baseline performance for a very poor model. The further another curve is away from this, the better it is performing. Conversely, a model that is very close to the diagonal line is not performing very well at all. To
3. Area under the ROC curve
quantify this performance, a measurement called AUC, or Area Under the Curve, is used. The AUC literally measures the area under the ROC curve.
The baseline model that is no better than random chance has an AUC of 0-point-50, because it divides the 1-by-1 unit square perfectly in half.
A perfect model has an AUC of 1-point-00, with a curve all the way in the upper-left of the square.
Most real-world results are somewhere in between. Generally speaking, the closer the AUC is to 1-point-00, the better, but there are some cases where AUC can be misleading.
4. Using AUC and ROC appropriately
Curves of varying shapes can have the same AUC value. For this reason, it is important to look not only at the AUC but also how the shape of each curve indicates how a model is performing across the range of predictions.
For example, one model may do extremely well at identifying a few easy cases at first but perform poorly on more difficult cases. Another model may do just the opposite. As this figure shows, both may end up with the same AUC.
ROC curves are an important tool for comparing models and selecting the best model for your specific project needs. When used with a single model, it can help to visualize the tradeoff between true positives and false positives for the outcome of interest.
5. Let's practice!
In the next exercise, you'll have the opportunity to plot an ROC curve for the donation data, to see how well the model is really performing.