A timeline compliant population
Assume that you want to construct a basetable for a predictive model that predicts whether donors will donate in 2018. The timeline indicates that the population should contain all donors that donated at least once since January 1st 2013, but made no donations after January 1st 2017.
Given is a pandas dataframe gifts
with all the donations made since 2010. In this exercise, you will construct a set with the donor ids of all donors in the population.
This exercise is part of the course
Intermediate Predictive Analytics in Python
Exercise instructions
- Construct a dataframe
gifts_include
containing all gifts made in 2013 or later and a dataframegifts_exclude
containing all gifts made in 2017 or later. - Construct a set
donors_include
containing all donor ids of donors ingifts_include
and a setdonors_exclude
containing all donor ids of donors ingifts_exclude
. - Construct the population using the
.difference()
method on your two sets.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Gifts made in 2013 or later
gifts_include = ____[____[____].dt.year >= ____]
# Gifts made in 2017 or later
gifts_exclude = ____[____[____].dt.year >= ____]
# Set with ids in gifts_include
donors_include = ____(____[____])
# Set with ids in gifts_exclude
donors_exclude = ____(____[____])
# Population
population = ____.difference(____)
print(len(population))