Performing grouped summaries of missingness
Now that you can create nabular data, let's use it to explore the data. Let's calculate summary statistics based on the missingness of another variable.
To do this we are going to use the following steps:
First,
bind_shadow()turns the data into nabular data.Next, perform some summaries on the data using
group_by()andsummarize()to calculate the mean and standard deviation, using themean()andsd()functions.
Latihan ini adalah bagian dari kursus
Dealing With Missing Data in R
Petunjuk latihan
For the
oceanbuoysdataset:bind_shadow(), thengroup_by()for the missingness of humidity (humidity_NA) and calculate the means and standard deviations for wind east west (wind_ew) usingsummarize()from dplyr.Repeat this, but calculating summaries for wind north south (
wind_ns).
Latihan interaktif praktis
Cobalah latihan ini dengan menyelesaikan kode contoh berikut.
# `bind_shadow()` and `group_by()` humidity missingness (`humidity_NA`)
oceanbuoys %>%
___() %>%
group_by(___) %>%
summarize(wind_ew_mean = mean(___), # calculate mean of wind_ew
wind_ew_sd = ___)) # calculate standard deviation of wind_ew
# Repeat this, but calculating summaries for wind north south (`wind_ns`).
___ %>%
___ %>%
group_by(___) %>%
summarize(___ = ___(___),
___ = ___(___))