Get startedGet started for free

Other summaries of missingness

Some summaries of missingness are particularly useful for different types of data. For example, miss_var_span() and miss_var_run().

  • miss_var_span() calculates the number of missing values in a specified variable for a repeating span. This is really useful in time series data, to look for weekly (7 day) patterns of missingness.

  • miss_var_run() calculates the number of "runs" or "streaks" of missingness. This is useful to find unusual patterns of missingness, for example, you might find a repeating pattern of 5 complete and 5 missings.

Both miss_var_span() and miss_var_run() work with the group_by operator from dplyr.

This exercise is part of the course

Dealing With Missing Data in R

View Course

Exercise instructions

Using the pedestrian dataset from naniar:

  • Calculate summaries of missingness for the variables in datasets using miss_var_span(), for a span of 4000.
  • Calculate summaries of missingness for the cases in datasets using miss_var_run().
  • Combine with dplyr's group_by operator for month.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Calculate the summaries for each run of missingness for the variable, hourly_counts
miss_var_run(pedestrian, var = ___)

# Calculate the summaries for each span of missingness, 
# for a span of 4000, for the variable hourly_counts
miss_var_span(pedestrian, var = ___, span_every = ___)

# For each `month` variable, calculate the run of missingness for hourly_counts
pedestrian %>% group_by(month) %>% ___()

# For each `month` variable, calculate the span of missingness 
# of a span of 2000, for the variable hourly_counts
pedestrian %>% group_by(___) %>% ___(var = ___, span_every = ___)
Edit and Run Code