CommencerCommencer gratuitement

Performing grouped summaries of missingness

Now that you can create nabular data, let's use it to explore the data. Let's calculate summary statistics based on the missingness of another variable.

To do this we are going to use the following steps:

  • First, bind_shadow() turns the data into nabular data.

  • Next, perform some summaries on the data using group_by() and summarize() to calculate the mean and standard deviation, using the mean() and sd() functions.

Cet exercice fait partie du cours

Dealing With Missing Data in R

Afficher le cours

Instructions

  • For the oceanbuoys dataset:

  • bind_shadow(), then group_by() for the missingness of humidity (humidity_NA) and calculate the means and standard deviations for wind east west (wind_ew) using summarize() from dplyr.

  • Repeat this, but calculating summaries for wind north south (wind_ns).

Exercice interactif pratique

Essayez cet exercice en complétant cet exemple de code.

# `bind_shadow()` and `group_by()` humidity missingness (`humidity_NA`)
oceanbuoys %>%
  ___() %>%
  group_by(___) %>% 
  summarize(wind_ew_mean = mean(___), # calculate mean of wind_ew
            wind_ew_sd = ___)) # calculate standard deviation of wind_ew
  
# Repeat this, but calculating summaries for wind north south (`wind_ns`).
___ %>%
  ___ %>%
  group_by(___) %>%
  summarize(___ = ___(___),
            ___ = ___(___))
Modifier et exécuter le code