Get startedGet started for free

Visualize imputations

Analyzing imputations and choosing the best one, is a task that requires lots of experimentation. It is important to make sure that your data does not become biased while imputing. In this last two exercises, you created 4 different imputations using mean, median, mode, and constant filling imputations.

In this exercise, you'll create a scatterplot of the DataFrames you imputed previously. To achieve this, you'll create a dictionary of the DataFrames with the keys being their title.

The DataFrames diabetes_mean, diabetes_median, diabetes_mode and diabetes_constant have been loaded for you.

This exercise is part of the course

Dealing with Missing Data in Python

View Course

Exercise instructions

  • Create 4 subplots by making a plot with 2 rows and 2 columns.
  • Create the dictionary imputations by mapping each key with its matching DataFrame.
  • Loop over axes and imputations, and plot each DataFrame in imputations.
  • Set the color to the nullity and the title for each subplot to the name of the imputation.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Set nrows and ncols to 2
fig, axes = plt.subplots(nrows=___, ncols=___, figsize=(10, 10))
nullity = diabetes.Serum_Insulin.isnull()+diabetes.Glucose.isnull()

# Create a dictionary of imputations
imputations = {'Mean Imputation': ___, 'Median Imputation': ___, 
               'Most Frequent Imputation': ___, 'Constant Imputation': ___}

# Loop over flattened axes and imputations
for ax, df_key in zip(___.___(), ___):
    # Select and also set the title for a DataFrame
    imputations[___].plot(x='Serum_Insulin', y='Glucose', kind='scatter', 
                          alpha=0.5, c=___, cmap='rainbow', ax=ax, 
                          colorbar=False, title=___)
plt.show()
Edit and Run Code