Treating missing data
In this exercise, you're working with another version of the accounts
data that contains missing values for both the cust_id
and acct_amount
columns.
You want to figure out how many unique customers the bank has, as well as the average amount held by customers. You know that rows with missing cust_id
don't really help you, and that on average, the acct_amount
is usually 5 times the amount of inv_amount
.
In this exercise, you will drop rows of accounts
with missing cust_id
s, and impute missing values of inv_amount
with some domain knowledge. dplyr
and assertive
are loaded and accounts
is available.
This exercise is part of the course
Cleaning Data in R
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Create accounts_clean
accounts_clean <- accounts %>%
# Filter to remove rows with missing cust_id
___
accounts_clean