Performing grouped summaries of missingness
Now that you can create nabular data, let's use it to explore the data. Let's calculate summary statistics based on the missingness of another variable.
To do this we are going to use the following steps:
First,
bind_shadow()
turns the data into nabular data.Next, perform some summaries on the data using
group_by()
andsummarize()
to calculate the mean and standard deviation, using themean()
andsd()
functions.
This exercise is part of the course
Dealing With Missing Data in R
Exercise instructions
For the
oceanbuoys
dataset:bind_shadow()
, thengroup_by()
for the missingness of humidity (humidity_NA
) and calculate the means and standard deviations for wind east west (wind_ew
) usingsummarize()
from dplyr.Repeat this, but calculating summaries for wind north south (
wind_ns
).
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# `bind_shadow()` and `group_by()` humidity missingness (`humidity_NA`)
oceanbuoys %>%
___() %>%
group_by(___) %>%
summarize(wind_ew_mean = mean(___), # calculate mean of wind_ew
wind_ew_sd = ___)) # calculate standard deviation of wind_ew
# Repeat this, but calculating summaries for wind north south (`wind_ns`).
___ %>%
___ %>%
group_by(___) %>%
summarize(___ = ___(___),
___ = ___(___))