1. Introduction to click-through rates
Hi everyone! I'm Kevin, and I'll be your instructor for this course on predicting CTR with machine learning in Python. In this course, we'll be covering a wide variety of topics ranging from understanding the ad ecosystem to various machine learning applications. For this lesson, we'll introduce some basic concepts in the ads ecosystem and discuss the importance of click-through rates (or CTR), a key metric.
2. Click-through rates
What makes an ad effective or not? One primary metric is called the click-through rate, or CTR, and is defined as the number of clicks on an ad divided by the number of views, or impressions, that the ad received. Because CTR reflects user engagement, companies serving ads (like Google and Facebook) want to maximize this metric to deliver the most usefulness to users of their platforms. On the flip side, those running ad campaigns, such as marketers and small businesses, also want to maximize CTR accordingly to deliver value to end users. Accurate prediction of CTR, leads to important media buying decisions by deciding which ads to show and to which users. The goal of this course will be to learn how to better use machine learning as a tool in analyzing CTRs to help maximize the effectiveness of an ad, whether starting from scratch or improving performance on existing ads.
3. A classification lens
Let's look at CTR through the machine learning lens of classification. Classification is the problem of assigning categories to new observations based on data seen from past observations. The past observations are called the training set, and the new observations are called the testing set. A classifier uses training data to learn how to make a prediction, and those predictions are evaluated on the testing data. The variable that the classifier is trying to predict is called the target variable. Since users either click or do not click on an ad, we will be using a binary target: a 0 if an ad is not clicked, and a 1 otherwise. The variables used for this prediction are called the features, and are information about the device, the user, etc. By collecting more data and harnessing the relevant features, CTR prediction can become more accurate.
4. A brief look sample data
Click log data, which tracks user clicks, has many different types of objects. Here is an example of such data in DataFrame format. As you can see, each row will often include different entities such as, device type, the position of the ad, etc. These entities are the columns of the DataFrame and are the features. We can do basic operations on these features, like filtering, and combining. For example, we can use the .isin method to get all of the columns that include the word "device" as follows.
The end goal will be predicting CTR, which for a sample dataset can be found by taking the sum of clicks divided by the total number of clicks, using the sum method and the len function.
5. Analyzing features
An important step to building the model will be in analyzing features, which usually require cleaning and transformation. To look at different frequencies of features, the value_counts method returns the counts of unique values of a given column. Here is an example with the device_type column, with two device_type values, 0 and 1. Another commonly used operation in analysis is the groupby method, which involves splitting a DataFrame, applying a function, and aggregating results. Here is an example with total clicks by device type.
6. Let's practice!
Now that we've done a high level overview of click-through rates, let's play around with some sample data!