Performing grouped summaries of missingness
Now that you can create nabular data, let's use it to explore the data. Let's calculate summary statistics based on the missingness of another variable.
To do this we are going to use the following steps:
First,
bind_shadow()
turns the data into nabular data.Next, perform some summaries on the data using
group_by()
andsummarize()
to calculate the mean and standard deviation, using themean()
andsd()
functions.
Cet exercice fait partie du cours
Dealing With Missing Data in R
Instructions
For the
oceanbuoys
dataset:bind_shadow()
, thengroup_by()
for the missingness of humidity (humidity_NA
) and calculate the means and standard deviations for wind east west (wind_ew
) usingsummarize()
from dplyr.Repeat this, but calculating summaries for wind north south (
wind_ns
).
Exercice interactif pratique
Essayez cet exercice en complétant cet exemple de code.
# `bind_shadow()` and `group_by()` humidity missingness (`humidity_NA`)
oceanbuoys %>%
___() %>%
group_by(___) %>%
summarize(wind_ew_mean = mean(___), # calculate mean of wind_ew
wind_ew_sd = ___)) # calculate standard deviation of wind_ew
# Repeat this, but calculating summaries for wind north south (`wind_ns`).
___ %>%
___ %>%
group_by(___) %>%
summarize(___ = ___(___),
___ = ___(___))