Get startedGet started for free

Replacing missing credit data

Now, you should check for missing data. If you find missing data within loan_status, you would not be able to use the data for predicting probability of default because you wouldn't know if the loan was a default or not. Missing data within person_emp_length would not be as damaging, but would still cause training errors.

So, check for missing data in the person_emp_length column and replace any missing values with the median.

The data set cr_loan has been loaded in the workspace.

This exercise is part of the course

Credit Risk Modeling in Python

View Course

Exercise instructions

  • Print an array of column names that contain missing data using .isnull().
  • Print the top five rows of the data set that has missing data for person_emp_length.
  • Replace the missing data with the median of all the employment length using .fillna().
  • Create a histogram of the person_emp_length column to check the distribution.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Print a null value column array
print(____.columns[____.____().any()])

# Print the top five rows with nulls for employment length
print(____[____[____].____()].head())

# Impute the null values with the median value for all employment lengths
____[____].____((cr_loan['person_emp_length'].____()), inplace=True)

# Create a histogram of employment length
n, bins, patches = plt.____(____[____], bins='auto', color='blue')
plt.xlabel("Person Employment Length")
plt.____()
Edit and Run Code