Aan de slagGa gratis aan de slag

Visualize imputations

Analyzing imputations and choosing the best one, is a task that requires lots of experimentation. It is important to make sure that your data does not become biased while imputing. In this last two exercises, you created 4 different imputations using mean, median, mode, and constant filling imputations.

In this exercise, you'll create a scatterplot of the DataFrames you imputed previously. To achieve this, you'll create a dictionary of the DataFrames with the keys being their title.

The DataFrames diabetes_mean, diabetes_median, diabetes_mode and diabetes_constant have been loaded for you.

Deze oefening maakt deel uit van de cursus

Dealing with Missing Data in Python

Cursus bekijken

Oefeninstructies

  • Create 4 subplots by making a plot with 2 rows and 2 columns.
  • Create the dictionary imputations by mapping each key with its matching DataFrame.
  • Loop over axes and imputations, and plot each DataFrame in imputations.
  • Set the color to the nullity and the title for each subplot to the name of the imputation.

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

# Set nrows and ncols to 2
fig, axes = plt.subplots(nrows=___, ncols=___, figsize=(10, 10))
nullity = diabetes.Serum_Insulin.isnull()+diabetes.Glucose.isnull()

# Create a dictionary of imputations
imputations = {'Mean Imputation': ___, 'Median Imputation': ___, 
               'Most Frequent Imputation': ___, 'Constant Imputation': ___}

# Loop over flattened axes and imputations
for ax, df_key in zip(___.___(), ___):
    # Select and also set the title for a DataFrame
    imputations[___].plot(x='Serum_Insulin', y='Glucose', kind='scatter', 
                          alpha=0.5, c=___, cmap='rainbow', ax=ax, 
                          colorbar=False, title=___)
plt.show()
Code bewerken en uitvoeren