How's our data integrity?
New data has been merged into the banking
DataFrame that contains details on how investments in the inv_amount
column are allocated across four different funds A, B, C and D.
Furthermore, the age and birthdays of customers are now stored in the age
and birth_date
columns respectively.
You want to understand how customers of different age groups invest. However, you want to first make sure the data you're analyzing is correct. You will do so by cross field checking values of inv_amount
and age
against the amount invested in different funds and customers' birthdays.
Both pandas
and datetime
have been imported as pd
and dt
respectively.
This exercise is part of the course
Cleaning Data in Python
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Store fund columns to sum against
fund_columns = ['fund_A', 'fund_B', 'fund_C', 'fund_D']
# Find rows where fund_columns row sum == inv_amount
inv_equ = banking[____].____(____) == ____
# Store consistent and inconsistent data
consistent_inv = ____[____]
inconsistent_inv = ____[____]
# Store consistent and inconsistent data
print("Number of inconsistent investments: ", inconsistent_inv.shape[0])