Removing missing data
You replaced missing data in person_emp_length
, but in the previous exercise you saw that loan_int_rate
has missing data as well.
Similar to having missing data within loan_status
, having missing data within loan_int_rate
will make predictions difficult.
Because interest rates are set by your company, having missing data in this column is very strange. It's possible that data ingestion issues created errors, but you cannot know for sure. For now, it's best to .drop()
these records before moving forward.
The data set cr_loan
has been loaded in the workspace.
This exercise is part of the course
Credit Risk Modeling in Python
Exercise instructions
- Print the number of records that contain missing data for interest rate.
- Create an array of indices for rows that contain missing interest rate called indices.
- Drop the records with missing interest rate data and save the results to
cr_loan_clean
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Print the number of nulls
print(____[____].____().____())
# Store the array on indices
____ = ____[____[____].____].____
# Save the new data without missing data
____ = ____.____(____)