Visualize imputations
Analyzing imputations and choosing the best one, is a task that requires lots of experimentation. It is important to make sure that your data does not become biased while imputing. In this last two exercises, you created 4 different imputations using mean, median, mode, and constant filling imputations.
In this exercise, you'll create a scatterplot of the DataFrames you imputed previously. To achieve this, you'll create a dictionary of the DataFrames with the keys being their title.
The DataFrames diabetes_mean
, diabetes_median
, diabetes_mode
and diabetes_constant
have been loaded for you.
This exercise is part of the course
Dealing with Missing Data in Python
Exercise instructions
- Create 4 subplots by making a plot with 2 rows and 2 columns.
- Create the dictionary
imputations
by mapping each key with its matching DataFrame. - Loop over
axes
andimputations
, and plot each DataFrame inimputations
. - Set the color to the
nullity
and the title for each subplot to the name of the imputation.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Set nrows and ncols to 2
fig, axes = plt.subplots(nrows=___, ncols=___, figsize=(10, 10))
nullity = diabetes.Serum_Insulin.isnull()+diabetes.Glucose.isnull()
# Create a dictionary of imputations
imputations = {'Mean Imputation': ___, 'Median Imputation': ___,
'Most Frequent Imputation': ___, 'Constant Imputation': ___}
# Loop over flattened axes and imputations
for ax, df_key in zip(___.___(), ___):
# Select and also set the title for a DataFrame
imputations[___].plot(x='Serum_Insulin', y='Glucose', kind='scatter',
alpha=0.5, c=___, cmap='rainbow', ax=ax,
colorbar=False, title=___)
plt.show()