LoslegenKostenlos loslegen

Replacing missing credit data

Now, you should check for missing data. If you find missing data within loan_status, you would not be able to use the data for predicting probability of default because you wouldn't know if the loan was a default or not. Missing data within person_emp_length would not be as damaging, but would still cause training errors.

So, check for missing data in the person_emp_length column and replace any missing values with the median.

The data set cr_loan has been loaded in the workspace.

Diese Übung ist Teil des Kurses

Credit Risk Modeling in Python

Kurs anzeigen

Anleitung zur Übung

  • Print an array of column names that contain missing data using .isnull().
  • Print the top five rows of the data set that has missing data for person_emp_length.
  • Replace the missing data with the median of all the employment length using .fillna().
  • Create a histogram of the person_emp_length column to check the distribution.

Interaktive Übung

Vervollständige den Beispielcode, um diese Übung erfolgreich abzuschließen.

# Print a null value column array
print(____.columns[____.____().any()])

# Print the top five rows with nulls for employment length
print(____[____[____].____()].head())

# Impute the null values with the median value for all employment lengths
____[____].____((cr_loan['person_emp_length'].____()), inplace=True)

# Create a histogram of employment length
n, bins, patches = plt.____(____[____], bins='auto', color='blue')
plt.xlabel("Person Employment Length")
plt.____()
Code bearbeiten und ausführen