Aan de slagGa gratis aan de slag

Performing grouped summaries of missingness

Now that you can create nabular data, let's use it to explore the data. Let's calculate summary statistics based on the missingness of another variable.

To do this we are going to use the following steps:

  • First, bind_shadow() turns the data into nabular data.

  • Next, perform some summaries on the data using group_by() and summarize() to calculate the mean and standard deviation, using the mean() and sd() functions.

Deze oefening maakt deel uit van de cursus

Dealing With Missing Data in R

Cursus bekijken

Oefeninstructies

  • For the oceanbuoys dataset:

  • bind_shadow(), then group_by() for the missingness of humidity (humidity_NA) and calculate the means and standard deviations for wind east west (wind_ew) using summarize() from dplyr.

  • Repeat this, but calculating summaries for wind north south (wind_ns).

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

# `bind_shadow()` and `group_by()` humidity missingness (`humidity_NA`)
oceanbuoys %>%
  ___() %>%
  group_by(___) %>% 
  summarize(wind_ew_mean = mean(___), # calculate mean of wind_ew
            wind_ew_sd = ___)) # calculate standard deviation of wind_ew
  
# Repeat this, but calculating summaries for wind north south (`wind_ns`).
___ %>%
  ___ %>%
  group_by(___) %>%
  summarize(___ = ___(___),
            ___ = ___(___))
Code bewerken en uitvoeren