Get startedGet started for free

The basetable timeline

1. The basetable timeline

Hi! Welcome to the second Predictive Analytics course! My name is Nele, I'm a data scientist at Python Predictions.

2. The predictive modeling process

In the first course, you learned to build predictive models, evaluate them and present them to business. In this course, you will take one step back, and learn how to build the basetable that is used to build the predictive model.

3. The basetable (1)

A predictive model can be used to predict an event. All information needed to make these predictions are stored in the basetable. There are three important concepts in the basetable.

4. The basetable (2)

The population is the group of people or objects you want to make a prediction for.

5. The basetable (3)

The candidate predictors describe the objects in the population. It is information that can be used to predict the event.

6. The basetable (4)

Finally, the target has information about the event to predict itself. It is one if the event occurs, and zero otherwise.

7. The timeline (1)

When building a basetable for predictive modeling, the first thing you should do is draw a timeline. On this timeline, you can depict the situation in which you want to use the predictive model.

8. The timeline (2)

For instance, assume that you want to construct a predictive model that predicts which donors are most likely to donate at least 50 Euro in the next 3 months.

9. The timeline (3)

You want to use this model on May 1st 2018, because then you want to send a letter to these donors.

10. The timeline (4)

The timeline shows that the predictive model can only use information that is available on May 1st 2018. Everything that happens after May 1st 2018, is unknown at the time you use the model.

11. Reconstructing history (1)

As the true target is unknown by definition, you need to reconstruct your timeline in the past such that information is available about the target period.

12. Reconstructing history (2)

For instance, if you have donations information about 2017, you could use a timeline that goes back one year in time.

13. Reconstructing history (3)

With this timeline, the basetable can be constructed. To calculate the target, you should consider donations made in the three months after May 1st 2017, and to calculate the predictive variables, you can use all information available before May 1st 2017.

14. Selecting relevant data in Python

In the following exercises, you will learn how to select from a file with donations only those donations that were made in a certain timeframe using Python. Later, you will learn how to construct the target and predictive variables from these donations. Given is a list of gifts from 2013 until 2017. The start date of the target is May 1st 2017, the end date of the target is August 1st 2017. The donations that can be used to calculate the target are donations between the start and end target date. The donations that can be used to construct predictive variables are the donations made before the start target date.

15. Let's practice!

It is extremely important not to violate the timeline. If the predictive variables use information about the target, the model is not valid. Indeed, the model uses information that is not available at the time you want to use the model. We will illustrate this in the exercises.