Get startedGet started for free

A timeline compliant population

Assume that you want to construct a basetable for a predictive model that predicts whether donors will donate in 2018. The timeline indicates that the population should contain all donors that donated at least once since January 1st 2013, but made no donations after January 1st 2017. Given is a pandas dataframe gifts with all the donations made since 2010. In this exercise, you will construct a set with the donor ids of all donors in the population.

This exercise is part of the course

Intermediate Predictive Analytics in Python

View Course

Exercise instructions

  • Construct a dataframe gifts_include containing all gifts made in 2013 or later and a dataframe gifts_exclude containing all gifts made in 2017 or later.
  • Construct a set donors_include containing all donor ids of donors in gifts_include and a set donors_exclude containing all donor ids of donors in gifts_exclude.
  • Construct the population using the .difference() method on your two sets.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Gifts made in 2013 or later
gifts_include = ____[____[____].dt.year >= ____]

# Gifts made in 2017 or later
gifts_exclude = ____[____[____].dt.year >= ____]

# Set with ids in gifts_include
donors_include = ____(____[____])

# Set with ids in gifts_exclude
donors_exclude = ____(____[____])

# Population
population = ____.difference(____)
print(len(population))
Edit and Run Code