The hunt for missing values

Questions about processing missing values are integral to any machine learning interview. If you are provided with a dataset with missing values, not addressing them will likely skew your results and lower your model's accuracy.

In this exercise, you'll practice the first pre-processing step by finding and exploring ways to handle missing values using pandas and numpy on a customer loan dataset.

The dataset, which you'll use for many of the exercises in this course, is saved to your workspace as loan_data.

This is where you are in the pipeline:

Machine learning pipeline

1
- Print out the features of loan_data along with the number of missing values.

2
- Drop the rows with missing values and print the percentage of rows remaining.
3
- Drop the columns with missing values and print the percentage of columns remaining.
4
- Impute loan_data's missing values with 0 into loan_data_filled
- Compare 'Credit Score' using .describe() before imputation using loan_data and after using loan_data_filled.

Data Pre-processing and Visualization

Supervised Learning

Unsupervised Learning

Model Selection and Evaluation

Exercise

The hunt for missing values

Instructions 1/4