Session Ready
Exercise

Missing values

Like in most datasets, missing values can themselves carry meaning. A significant frequency of missing values (> ~5%) indicates that respondents did not understand the question, did not want to reveal their answer, or didn't find it important.

While this is not a course on dealing with missing values (check out this DataCamp course for that), inspecting missing values is a crucial first step toward survey development. Let's take a closer look at the missing_lots survey, whose items are about 20% missing.

The Hmisc package has been loaded into your environment.

Instructions
100 XP
  • Print the total number of rows of the missing_lots data frame using the relevant base R function.
  • Get the total number of complete cases in the data frame using the base R functions nrow() and na.omit().
  • Find the total number of missing records for each item using colSums() and is.na().
  • Plot a hierarchical cluster of missing values with naclus() from Hmisc.