Replacing missing credit data

Now, you should check for missing data. If you find missing data within loan_status, you would not be able to use the data for predicting probability of default because you wouldn't know if the loan was a default or not. Missing data within person_emp_length would not be as damaging, but would still cause training errors.

So, check for missing data in the person_emp_length column and replace any missing values with the median.

The data set cr_loan has been loaded in the workspace.

Print an array of column names that contain missing data using .isnull().
Print the top five rows of the data set that has missing data for person_emp_length.
Replace the missing data with the median of all the employment length using .fillna().
Create a histogram of the person_emp_length column to check the distribution.

Exploring and Preparing Loan Data

Logistic Regression for Defaults

Gradient Boosted Trees Using XGBoost

Model Evaluation and Implementation

Exercise

Replacing missing credit data

Instructions