Get startedGet started for free

The population

1. The population

In this video you will learn what the requirements for a sound population are.

2. Population requirements

Recall from the previous course that the population of your basetable is the set of observations, in many cases persons, you want to make a prediction for. These persons should be eligible for being a target. Consider the example where you want to predict which donors are most likely to donate after you send them a letter to ask them for a donation. The population should in this case consist of all candidate donors that are eligible for receiving a letter: the donor’s address should be available, the donor’s privacy settings should allow sending a letter, and so on. It is important not to include donors in the population that are not eligible for receiving a letter, as this could be a different type of donor that might disturb your model. For instance, donors that do not disclose their address might be more hesitant to donate. If you build the predictive model based on these donors, it will take this effect into account, which is not desired as the population for which the model is used does not include this type of donors. Equivalently, it is important to include all eligible donors, as you want the whole range of eligible donors to be reflected in the model.

3. Timeline compliant population: age (1)

Secondly, the population should also be compliant with the timeline. If the basetable is constructed using a timeline that has May 1st 2018 as start date of the target period, the population properties should be in place on May 1st 2018.

4. Timeline compliant population: age (2)

For instance, if you want to send a letter to donors younger than 25, you should make sure they are younger than 25 on May 1st 2018.

5. Timeline compliant population: donations (1)

Let’s consider an example. You want to construct a predictive model to know which donors are most likely to donate in the month after May 1st 2018.

6. Timeline compliant population: donations (2)

The only restriction on the population is that the donor has donated in 2017, but not in 2018 yet.

7. Timeline compliant population: donations (3)

The basetable can be constructed by reconstructing this situation one year earlier, as depicted on the timeline below. This means that the population of the basetable consists of donors that have donated in 2016, but not before May 1st 2017.

8. Population in python

To construct this population, we can use the list of donations that is available. We first select donations made in 2016 and extract the unique donors from it by making a set from it. This set contains the unique donor ids of donors that made a donation in 2016. Then we select the donations made in 2017 before May 1st 2017, and extract the unique donors from it by again making a set from it. This set contains the unique donor ids of donors that made a donation between January 1st and May 1st 2017. The final population consists of donors in set_include that are not in set_exclude, so we can construct it as the set difference of these two sets. As a result, population is a set that contains the unique donor ids of donors that made a donation in 2016, but not between January 1st and May 1st 2017. If needed, you can change this set back to a list afterwards.

9. Let's practice!

Time to practice! You will learn to construct the population in several situations.