Generate scatter plot with missingness
In this exercise you'll create a scatter plot consisting of both missing and non-missing values. You will utilize the function fill_dummy_values()
which you created in the previous exercise for filling in dummy values in the DataFrame diabetes_dummy
.
The nullity of a column is calculated using the .isnull()
method. The nullity returns a Series (pd.Series
) of True
or False
.
For setting different colors to the missing and non-missing values, you can simply combine the nullity using OR(|
) operation on the columns that you are plotting, resulting in:
True
\(\rightarrow\) Eithercol1
orcol2
or both values are missing.False
\(\rightarrow\) Neither ofcol1
andcol2
values are missing.
The DataFrame diabetes
and the function fill_dummy_values()
have been loaded for your usage.
This exercise is part of the course
Dealing with Missing Data in Python
Exercise instructions
- Use OR operation to combine nullity of
Skin_Fold
andBMI
. - Fill dummy values in
diabetes_dummy
using the functionfill_dummy_values()
. - Create a scatter plot of
'BMI'
versus'Skin_Fold'
; note Y versus X implies Y-axis against X-axis or Y as a function of X.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Use OR operation to combine Skin_Fold and BMI nullity
nullity = ___
# Fill dummy values in diabetes_dummy
diabetes_dummy = ___
# Create a scatter plot of BMI versus Skin_Fold
diabetes_dummy.plot(x=___, y=___, kind='___', alpha=0.5,
# Set color to nullity of BMI and Skin_Fold
c=___,
cmap='rainbow')
plt.show()