Visualize imputations

Analyzing imputations and choosing the best one, is a task that requires lots of experimentation. It is important to make sure that your data does not become biased while imputing. In this last two exercises, you created 4 different imputations using mean, median, mode, and constant filling imputations.

In this exercise, you'll create a scatterplot of the DataFrames you imputed previously. To achieve this, you'll create a dictionary of the DataFrames with the keys being their title.

The DataFrames diabetes_mean, diabetes_median, diabetes_mode and diabetes_constant have been loaded for you.

Diese Übung ist Teil des Kurses

<Kurs>Dealing with Missing Data in Python</Kurs>

Kurs ansehen

Übungsanweisungen

Create 4 subplots by making a plot with 2 rows and 2 columns.
Create the dictionary imputations by mapping each key with its matching DataFrame.
Loop over axes and imputations, and plot each DataFrame in imputations.
Set the color to the nullity and the title for each subplot to the name of the imputation.

Interaktive praktische Übung

Versuche dich an dieser Übung, indem du diesen Beispielcode vervollständigst.

# Set nrows and ncols to 2
fig, axes = plt.subplots(nrows=___, ncols=___, figsize=(10, 10))
nullity = diabetes.Serum_Insulin.isnull()+diabetes.Glucose.isnull()

# Create a dictionary of imputations
imputations = {'Mean Imputation': ___, 'Median Imputation': ___, 
               'Most Frequent Imputation': ___, 'Constant Imputation': ___}

# Loop over flattened axes and imputations
for ax, df_key in zip(___.___(), ___):
    # Select and also set the title for a DataFrame
    imputations[___].plot(x='Serum_Insulin', y='Glucose', kind='scatter', 
                          alpha=0.5, c=___, cmap='rainbow', ax=ax, 
                          colorbar=False, title=___)
plt.show()

Code bearbeiten und ausführen