1. Learn
  2. /
  3. Courses
  4. /
  5. Cleaning Data in Python

Exercise

Follow the money

In this exercise, you're working with another version of the banking DataFrame that contains missing values for both the cust_id column and the acct_amount column.

You want to produce analysis on how many unique customers the bank has, the average amount held by customers and more. You know that rows with missing cust_id don't really help you, and that on average acct_amount is usually 5 times the amount of inv_amount.

In this exercise, you will drop rows of banking with missing cust_ids, and impute missing values of acct_amount with some domain knowledge.

Instructions

100 XP
  • Use .dropna() to drop missing values of the cust_id column in banking and store the results in banking_fullid.
  • Use inv_amount to compute the estimated account amounts for banking_fullid by setting the amounts equal to inv_amount * 5, and assign the results to acct_imp.
  • Impute the missing values of acct_amount in banking_fullid with the newly created acct_imp using .fillna().