Other summaries of missingness
Some summaries of missingness are particularly useful for different types of data. For example, miss_var_span()
and miss_var_run()
.
miss_var_span()
calculates the number of missing values in a specified variable for a repeating span. This is really useful in time series data, to look for weekly (7 day) patterns of missingness.miss_var_run()
calculates the number of "runs" or "streaks" of missingness. This is useful to find unusual patterns of missingness, for example, you might find a repeating pattern of 5 complete and 5 missings.
Both miss_var_span()
and miss_var_run()
work with the group_by
operator from dplyr
.
This exercise is part of the course
Dealing With Missing Data in R
Exercise instructions
Using the pedestrian
dataset from naniar
:
- Calculate summaries of missingness for the variables in datasets using
miss_var_span()
, for a span of 4000. - Calculate summaries of missingness for the cases in datasets using
miss_var_run()
. - Combine with dplyr's
group_by
operator formonth
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Calculate the summaries for each run of missingness for the variable, hourly_counts
miss_var_run(pedestrian, var = ___)
# Calculate the summaries for each span of missingness,
# for a span of 4000, for the variable hourly_counts
miss_var_span(pedestrian, var = ___, span_every = ___)
# For each `month` variable, calculate the run of missingness for hourly_counts
pedestrian %>% group_by(month) %>% ___()
# For each `month` variable, calculate the span of missingness
# of a span of 2000, for the variable hourly_counts
pedestrian %>% group_by(___) %>% ___(var = ___, span_every = ___)