Removing duplicate objects
Assume that you want to construct a predictive model in order to select donors that are most likely to respond on a letter. The population of the basetable should contain donors that have an adress available, and that have privacy settings that allow to send them a letter.
All candidate donors are given in a dataframe donors
with three columns: the donor_id
, a flag address
that is 1 if the address is available and 0 otherwise, and a flag letter_allowed
that is 1 if one can send this donor a letter and 0 otherwise.
In this exercise you will construct a set with the donors that should go in the population.
This exercise is part of the course
Intermediate Predictive Analytics in Python
Exercise instructions
- Create a dataframe
donors_population
only containing observations that have address available and for which a letter is allowed. - Create a list containing the donor ids in
donors_population
. - Construct the final population and then numbers of donors in it.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Create a dataframe donors_population
donors_population = ____[(____["____"] == ____) & (____["____"] == ____)]
# Create a list of donor IDs
population_list = ____(____["____"])
# Select unique donors in population_list
population = ____(____)
print(len(population))