Using multiple snapshots

1. Using multiple snapshots

Sometimes, the basetable is too small to construct a sound model. In this video, you will learn how to use multiple snapshots to increase the basetable size.

2. Not enough data

Several problems can occur when using a small basetable. Overtraining can happen, when the predictive model tries to fit the small basetable but does not generalize to new data. Another problem could be that there are not enough examples in the basetable to train the model well, reflected by a low AUC value. Even if the population of the basetable is fairly large, it can be hard to construct an accurate predictive model if the absolute number of targets is too small. In this case, there are not enough observations with target value 1 (people that actually made a gift) to teach the model how to distinguish these observations from the rest. There is no clear rule about how large the basetable should be to construct an accurate model, this obviously depends on the candidate predictors and target definition.

3. Using multiple snapshots (1)

One way to increase the basetable size, is to use multiple snapshots. Consider a small non profit organisation with a donor base of about 1000 donors. You want to predict whether a donor will donate more than 50 Euro along next month. The model will be used in April 2019.

4. Using multiple snapshots (2)

When reconstructing the timeline in history, you can use April 2018 as target period.

5. Using multiple snapshots (3)

However, the resulting basetable is rather small: the population consists of only about 1000 donors.

6. Using multiple snapshots (4)

To solve this, you can additionally reconstruct the timeline with target period May 2018. This results in another basetable with again about 1000 donors.

7. Stacking basetables

Stacking these basetables results in a larger basetable of about 2000 donors. In Python, you can stack two basetables using the `append` function.

8. Snapshots and seasonality

In the previous example, two snapshots were used to create the final basetable. In practice, it might be necessary to include more snapshots. In some cases, 10 or even more snapshots are used. When using multiple snapshots, you should keep in mind seasonality. For instance, as donations are higher during the holidays, you should not include the december or january snapshot if you want to make predictions in April. On the other hand, if you want to make a predictive model that predicts donations during the holidays, you should include snapshots around the holidays. Summarized, you should make sure the snapshots used are representative for the period the predictive model will be used in.

9. Let's practice!

Time to practice! In the exercises, you will learn how to combine different basetables based on multiple snapshots.

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.