Follow the money
In this exercise, you're working with another version of the banking
DataFrame that contains missing values for both the cust_id
column and the acct_amount
column.
You want to produce analysis on how many unique customers the bank has, the average amount held by customers and more. You know that rows with missing cust_id
don't really help you, and that on average acct_amount
is usually 5 times the amount of inv_amount
.
In this exercise, you will drop rows of banking
with missing cust_id
s, and impute missing values of acct_amount
with some domain knowledge.
This exercise is part of the course
Cleaning Data in Python
Exercise instructions
- Use
.dropna()
to drop missing values of thecust_id
column inbanking
and store the results inbanking_fullid
. - Use
inv_amount
to compute the estimated account amounts forbanking_fullid
by setting the amounts equal toinv_amount * 5
, and assign the results toacct_imp
. - Impute the missing values of
acct_amount
inbanking_fullid
with the newly createdacct_imp
using.fillna()
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Drop missing values of cust_id
banking_fullid = banking.____(subset = ['____'])
# Compute estimated acct_amount
acct_imp = ____
# Impute missing acct_amount with corresponding acct_imp
banking_imputed = banking_fullid.____({'____':____})
# Print number of missing values
print(banking_imputed.isna().sum())