Will you delete?
Before deleting missing values completely, you must consider the factors for deletion. The simplest factor to consider is the size of the missing data. More complex reasons affecting missingness may require domain knowledge. In this exercise, you will identify the reason for missingness and then perform the appropriate deletion.
You'll first use msno.matrix()
and msno.heatmap()
to visualize missingness and the correlation between variables with missing data. You will then determine pattern in missingness. Lastly, you'll delete depending on the type of missingness.
The diabetes
DataFrame has been loaded for you.
Note that we've used a proprietary display()
function instead of plt.show()
to make it easier for you to view the output.
This exercise is part of the course
Dealing with Missing Data in Python
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Visualize the missingness in the data
___.___(___)
# Display nullity matrix
display("/usr/local/share/datasets/matrix_diabetes.png")