Get startedGet started for free

Applications of metric evaluation

1. Applications of metric evaluation

In chapter one, we briefly discussed why it is useful to look beyond just accuracy when it comes to evaluation metrics. In this lesson, we will dive deeper on particular applications of other metrics.

2. Four categories of outcomes

As we briefly discussed in chapter one during our discussion of the ROC Curve, there exist four categories of outcomes in classification: true positives (TP), false positives (FP), true negatives (TN), and false negatives (FN), as seen. These four describe what our model predicts the target for a given example is, versus what it actually is. Our model will predict a 1 (positive class) if there is a click. For each of the four categories, the first part of the category (true or false) represents whether the model was correct or not. The second part of the category (positive or negative) represents the target label our model applied. Therefore, a true positive is a case where the model predicted a 1, and the target was indeed 1, and so on.

3. Interpretations of four categories

Now let's interpret the four categories in the real world. In running an ad campaign, you need to purchase impressions. There is a complicated bidding process involved, but the basic idea is that if your model predicts an impression will result in a click, there will be a bid on that impression, which will cost you money. If the prediction is that there will be no click, then there is no buying and hence no cost. Given this, the true positives represent the impressions that were clicked - the desirable business outcome, which is money gained (since each click has some implicit monetary value). False positives represent the impressions that did not get clicked on that we had to pay for, or money lost. True negatives represent the cases where the model correctly predicted there would be no click so no money was spent, so this is money saved. False negatives represent impressions where the model predicted no click but in reality would would have been a click, so this is money lost out on.

4. Confusion matrix

You can view the four categories using the confusion_matrix function from sklearn, which takes in an array of actual targets and an array of predicted targets. The result: a 2-D array where each index i, j represents the number of examples in group i but predicted in group j. For example, the index [0][1] would represent the total number of examples predicted to be positive (since j = 1) but actually not positive (since i = 0), therefore representing the false positives. In this case, that number is 166. Besides indexing, since the confusion matrix is an array-like object, we can use the ravel method to "flatten" the array, as follows. It returns the elements in order by array indexing, so the order is: true negatives, false positives, false negatives, then true positives.

5. ROI analysis

Now let's look into how we can use these four categories to assess an ROI (return on investment) on ad spend. Typically impressions are charged in the cost per 1000, so let's denote that price as a constant, c. We will assume each click (through downstream effects of a chance to purchase a product) has some return r. Then the total return on clicks is given as tp * r. The associated cost is (tp + fp) * c. Therefore, we want tp * r > (tp + fp) * c, and can look at the ratio of the two quantities as an ROI on ad spend.

6. Let's practice!

Now that you've learned about the four categories, the confusion matrix, and how to conduct an ROI analysis, let's jump right in!