Creating a missing value dummy
Given a basetable
that has a predictive variable "total_donations" that has the total number of donations a donor ever made. This variable can have missing values, indicating that this donor never made a donation before. This is important information on its own, so it is appropriate to create a variable "no_donations" that indicates whether "total_donations" is missing.
This exercise is part of the course
Intermediate Predictive Analytics in Python
Exercise instructions
- Create a new column "no_donations" in
basetable
that has value 1 iftotal_donations
is missing and 0 otherwise. - Calculate the number of missing values in
total_donations
and assign it tonumber_na
. - Print the percentage of missing values.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Create dummy indicating missing values
basetable["____"] = pd.Series([____ if b else ____ for b in basetable["total_donations"].isna()])
# Calculate number of missing values
number_na = sum(____["no_donations"] == ____)
# Calculate percentage of missing values
print(round(____ / ____(____), 2))