Aan de slagGa gratis aan de slag

Deduce MNAR

In the previous exercise, you worked on identifying the type of missing values given the missingness summary. In this exercise, you'll continue on that spree to affirmatively identify data that is Missing Not at Random (MNAR).

The missingness summary for the diabetes DataFrame is as below.

Your goal is to sort the diabetes DataFrame on Serum_Insulin and identify the correlation between Skin_Fold and Serum_Insulin.

Note that we've used a proprietary display() function instead of plt.show() to make it easier for you to view the output.

Deze oefening maakt deel uit van de cursus

Dealing with Missing Data in Python

Cursus bekijken

Oefeninstructies

  • Import the missingno package as msno.
  • Sort the values of Serum_Insulin column in diabetes.
  • Visualize the missingness summary of Serum_Insulin with msno.matrix().

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

# Import missingno as msno
___

# Sort diabetes dataframe on 'Serum Insulin'
sorted_values = ___.___(___)

# Visualize the missingness summary of sorted
___.___(___)

# Display nullity matrix
display("/usr/local/share/datasets/matrix_sorted.png")
Code bewerken en uitvoeren