1. Model evaluation
In this lesson, we will revisit and extend upon some of the evaluation metrics from last chapter.
2. Precision and recall
Recall that in Chapter 3, we discussed the business interpretations of precision and recall. Precision represents an ROI on ad spend through clicks. A low precision means getting very little tangible ROI through lack of clicks. Recall represents ad targeting the relevant audience of people. A low recall means the company is not showing as many relevant ads as it could have done, or losing out on opportunities for ROI. In real life, the two metrics may not be treated equally: it seems sensible that a company might care more about its precision since that is a tangible ROI, rather than recall, which is a more intangible ROI.
3. F-beta score
To address this difference in weighting the two metrics, we can look at the F-beta score, which is a weighted harmonic mean between precision and recall and is given by the above formula. Harmonic just means that its value always lies in between the precision and recall but will be closer to the smaller of the two values (rather than an arithmetic mean, which is exactly in the middle). The beta coefficient represents how much we want to weight the two metrics: with beta = 1 both are weighted equally, for a beta between 0 and 1 the precision is made smaller and hence is weighted more (by definition of harmonic), whereas for a beta greater than one, the precision is made larger so that the recall is weighted more. Therefore, if we care about weighting precision more, then we want to choose a beta between 0 and 1, say 0.5. We can calculate the F-beta score by using sklearn's fbeta_score function, which takes in a parameter, beta, after y_true and y_pred.
4. AUC of ROC curve versus precision
As before, AUC of the ROC curve is another important evaluation metric. It can be calculated in one line through the sklearn's roc_auc_score function, which takes in the arguments as follows. Recall that in chapter one we briefly discussed the implications of the click-through rate being low despite a high AUC of the ROC curve. Here we will dig a little deeper into that phenomena, which is very relevant for CTR prediction. Remember that the AUC of the ROC curve plots true positive rate versus false positive rate, and precision is given by the total true positives divided by total of true and false positives, as shown in the equations. In a scenario where there is an imbalanced dataset with a small number of true positives relative to true negatives, such as clicks on ads, it can be the case where the false positive rate is low, leading to a high AUC of the ROC curve, whereas precision is low. To be concrete, let's assume we have 100 true negatives, 10 true positives, and 10 false positives. Then we get exactly that: a low false positive rate which should correspond with a higher AUC, but low precision. Therefore it is important to monitor both metrics, along with a weighted F-beta score.
5. ROI on ad spend
The same ROI analyses from last chapter are still relevant and important. As a reminder, we make some assumptions about the costs and returns on clicks, and look at the total return on clicks through the true positives versus the associated cost weighing in the false positives. The ROI can be simplified as a function of precision and ratio of return to cost, as shown.
6. Let's practice!
Now that we've done a high level overview of model evaulation, let's jump right in!