The target

1. The target

Once the timeline is set and the population is in place, you are ready to add the target to the basetable.

2. Target definition

The target is a special column in the basetable, namely the value, zero or one, that you want to predict. In a predictive modeling setting, the value of the target equals one if a certain event happens during the target period for the observation, and zero otherwise. These events can take many forms, depending on what you want to predict. In the donations example, it could be whether a donor donates a certain amount during a given period, whether a donor changes address during a given period or whether a donor unsubscribes from the donor list.

3. Target timeline (1)

Again, one should keep in mind the timeline when adding the target to the basetable. The true target is unknown by definition: it is the unknown event that you want to predict.

4. Target timeline (2)

In order to obtain a basetable that has a target column on which the model can be constructed, we need to use a timeline earlier in time where the target is known. For instance, assume today is August 1st 2018 and you want to construct a predictive model that predicts whether someone will donate in August 2018. Then you should define the target on a similar timeline, for instance one year earlier.

5. Target timeline (3)

The target in the basetable is then 1 if the donor donated in August 2017 and 0 otherwise.

6. Defining the target in Python

Coding the target in Python strongly depends on the definition of the target and how the information about the target event is stored in the data. For instance, consider the case where the target is whether someone unsubscribes from the donor list in 2017, and a list called `unsubscribe` with ids of donors that unsubscribe in 2017 is given. The basetable is already initialized in the pandas dataframe `basetable`. All you need is some python magic to add the target column to the basetable now. You can use list comprehension to create a list that has one value for each entry in the basetable donor id column. If the donor id is in the unsubscribe list, the target should be 1, and otherwise 0. All that's left to do is add this list as a column in your dataframe, using pandas series.

7. Defining an aggregated target in Python

Consider an other example where the target is whether someone is a golden donor in 2017, that is, he donates more than 500 Euro in 2017. The past donations are available in the gifts pandas dataframe, the basetable is already initialized in the pandas dataframe `basetable`. To calculate the target, you should first select all the donations made in 2017. You can do this by defining the start and end date of the target period. Next, you select donations that are within this target period. As you want to know whether someone donates more than 500 Euro, you need to group the donations by the donor id, and make the sum. The `reset_index()` function makes sure the result is formatted as a dataframe. With these values, you can construct a list containing the donor ids of donors that donated more than 500 Euro. This target column can be added to the basetable using list comprehension.

8. The basetable

After adding the target column, the basetable looks like this. It has one row for each observation in the population, a donor_id column that has a unique id for each donor and the target column that indicates whether the donor is a target or not. In the next chapters you will learn how to add candidate predictors to this basetable.

9. Let's practice

For now, let's practice defining targets in the exercises.

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.