Absolute difference between two years
As an other variable, you might want to add the absolute difference in donations made in the last year (2017) and the number of donations made in the year before that (2016).
To add this variable to the basetable
, two pandas dataframes gifts_2016
and gifts_2017
with donations made in 2016 and 2017 respectively are constructed for you.
Recall that you can count the occurances in a pandas dataframe df
for each id in a column id
in a pandas dataframe df using the groupby
function:
df_grouped = df.groupby("id")["value"].count().reset_index()
and that you can left join two dataframes df1
and df2
on a given merge_id
using the following code:
merged_df = pd.merge(df1, df2, on=["merge_id"], how="left")
This exercise is part of the course
Intermediate Predictive Analytics in Python
Exercise instructions
- Calculate for each donor in
gifts_2016
the number of donations made in 2016. - Calculate for each donor in
gifts_2017
the number of donations made in 2017. - Add the number of donations made in 2016 and 2017 to the basetable.
- Calculate the difference between the number of donations made in 2017 and 2016 to the basetable.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Number of gifts in 2016 and 2017 for each donor
gifts_2016_bydonor = gifts_2016.____("____").____().reset_index()
gifts_2016_bydonor.columns = ["donor_ID", "donations_2016"]
gifts_2017_bydonor = gifts_2017.____("____").____().reset_index()
gifts_2017_bydonor.columns = ["donor_ID", "donations_2017"]
# Add number of gifts in 2016 and 2017 to the basetable
basetable = pd.merge(____, ____, on="donor_ID", how="____")
basetable = pd.merge(____, ____, on="donor_ID", how="____")
# Calculate the number of gifts in 2017 minus number of gifts in 2016
basetable.fillna(0)
basetable["gifts_2017_min_2016"] = ____["____"] - ____["____"]
print(basetable.head())