Follow the money
In this exercise, you're working with another version of the banking DataFrame that contains missing values for both the cust_id column and the acct_amount column.
You want to produce analysis on how many unique customers the bank has, the average amount held by customers and more. You know that rows with missing cust_id don't really help you, and that on average acct_amount is usually 5 times the amount of inv_amount.
In this exercise, you will drop rows of banking with missing cust_ids, and impute missing values of acct_amount with some domain knowledge.
This exercise is part of the course
Cleaning Data in Python
Exercise instructions
- Use
.dropna()to drop missing values of thecust_idcolumn inbankingand store the results inbanking_fullid. - Use
inv_amountto compute the estimated account amounts forbanking_fullidby setting the amounts equal toinv_amount * 5, and assign the results toacct_imp. - Impute the missing values of
acct_amountinbanking_fullidwith the newly createdacct_impusing.fillna().
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Drop missing values of cust_id
banking_fullid = banking.____(subset = ['____'])
# Compute estimated acct_amount
acct_imp = ____
# Impute missing acct_amount with corresponding acct_imp
banking_imputed = banking_fullid.____({'____':____})
# Print number of missing values
print(banking_imputed.isna().sum())