Exercise

The hunt for missing values

Questions about processing missing values are integral to any machine learning interview. If you are provided with a dataset with missing values, not addressing them will likely skew your results and lower your model's accuracy.

In this exercise, you'll practice the first pre-processing step by finding and exploring ways to handle missing values using pandas and numpy on a customer loan dataset.

The dataset, which you'll use for many of the exercises in this course, is saved to your workspace as loan_data.

This is where you are in the pipeline:

Machine learning pipeline

Instructions 1/4

undefined XP
  • 1
    • Print out the features of loan_data along with the number of missing values.
  • 2
    • Drop the rows with missing values and print the percentage of rows remaining.
  • 3
    • Drop the columns with missing values and print the percentage of columns remaining.
  • 4
    • Impute loan_data's missing values with 0 into loan_data_filled
    • Compare 'Credit Score' using .describe() before imputation using loan_data and after using loan_data_filled.