Get startedGet started for free

CTR prediction using decision trees

1. CTR prediction using decision trees

In this lesson, we'll learn about decision trees and use the concepts from the previous lesson to build a classifier for CTR prediction.

2. Decision trees

A decision tree has nodes (represented by the circles and boxes) being the features, and branches (the lines connecting them) as decisions based on which features best separate the data. Here is an example for evaluating credit loans. The first split is the age of the applicant. If they are middle aged, the algorithm say yes to the loan immediately, which is the first row of the outcomes table. If the applicant is in the youth age group, and they are not a student, the algorithm says no (row 2 of table) otherwise it says yes (row 3 of table). When applied to CTR prediction, building such a model can provide a heuristic for understanding why a particular ad is more likely to be clicked on by a particular user - is it because of a user's device, their location, the placement of the ad, etc.?

3. Training and testing the model

As before, we will use a model from sklearn, along with both the fit and predict methods. Below is a recap of the relevant parameters for each method, along with examples of outputs for both predict and predict_proba. The scores from predict_proba will be important in evaluation of other metrics, like the ROC curve which we will discuss shortly. Lastly, instead of manually specifying the training and testing split for our data as before, we can use sklearn's train_test_split function going forward. This function takes in: X (the features) and y (the targets), and a test_size (which denotes the size of the testing split as a percentage of total sample size).

4. Evaluation with ROC curve

The ROC (Receiver Operating Characteristics) curve plots true positive rate on the Y-axis and false positive rate on the X-axis for various threshold levels. For CTR prediction, true positives are when the model predicts a click accurately, and false positives are when the model predicts a click but there is no click. We will discuss these quantities more in detail in Chapter 3. For now, know that the top-left is ideal since it represents a true positive rate of 1 and a false positive rate of 0. The corresponding area under the curve, or AUC, measures model skill. A higher AUC means more skill from the classifier at discriminating between the two target classes. The dotted blue line represents a baseline AUC of 0.5 and represents a "random-coin flip" classifier. For the orange line we want it to have an AUC larger than 0.5, and ideally as close to 1 as possible.

5. AUC of ROC curve

We can use sklearn's predict_proba method, along with the roc_curve function, which takes in the test and score arrays, to get the ROC curve. Then, we can use the corresponding auc function to get the area under the ROC curve. It takes as input the arrays of false positives and true positives, as seen here. You might be wondering - if we do indeed have a good model, as judged by the AUC of the ROC curve, and the click-through rate is low, what does that mean? In this situation, the types of ads being served are likely not correctly targeting the desired audience. If you were serving these ads, you could change around the imagery, the message, or the audience itself. We will discuss these implications more in depth in Chapter 3.

6. Let's practice!

Now that you've learned about decision trees and the ROC curve, let's jump into CTR prediction with decision trees!