Get startedGet started for free

Area under the curve (AUC)

1. Area under the curve (AUC)

Just looking at a ROC curves starts to give us a good idea of how to evaluate whether or not our predictive model is any good.

2. From ROC to AUC

One interesting observation is that models with random predictions tend to produce curves that closely follow the diagonal line. On the other hand, models with a classification threshold that allows for perfect separation of classes produce a "box" with a single point at (1,0) to represent a model where it is possible to achieve a 100% true positive rate and 0% false positive rate. Wouldn't that be nice? Continuing with this example, if we calculate the area under each of these 2 ROC curves, an interesting property emerges: the area under the curve for a perfect model is exactly 1, as our plot represents a 1 by 1 square, and the average area under the curve for a random model is point-5, as our plot represents a diagonal line.

3. Defining AUC

We can use this insight to formalize a measure of model accuracy known as "AUC" or "area under the curve." This metric is calculated based on the ROC curve plot, and is extremely useful. Its a single-number summary of the model's accuracy that does not requires us to manually evaluate confusion matrices. This number summarizes the model's performance across all possible classification thresholds, and is a single metric we can use to rank different models within the same dataset.

4. Defining AUC

It ranges from 0 to 1, where point-5 is the AUC of a random model and 1-point-0 is the AUC of a perfect model. (A perfectly anti-predictive model would have an AUC of 0, but that rarely happens). In practice most models fall between point-5 and 1-point-0, while a really bad model can occasionally be in the point-4 range. As a very rough rule of thumb, AUC can be thought of as a letter grade, where point-9 is an "A", point-8 is a "B", point-7 is a “C", point-5 is an "F", and so on. I'm generally happy with a model that has an AUC of point-8 or higher, and models in the point-7 range are often useful.

5. Let's practice!

Fortunately, the caret package automates calculating the area under the ROC curve for us. Let's practice making use of this versatile metric.