Exercise 5. group_by
Now let's practice using the group_by
function.
What we are about to do is a very common operation in data science: you will split a data table into groups and then compute summary statistics for each group.
We will compute the average and standard deviation of systolic blood pressure for females for each age group separately. Remember that the age groups are contained in AgeDecade
.
This exercise is part of the course
Data Science Visualization - Module 2
Exercise instructions
- Use the functions
filter
,group_by
,summarize
, and the pipe%>%
to compute the average and standard deviation of systolic blood pressure for females for each age group separately. - Within
summarize
, save the average and standard deviation of systolic blood pressure (BPSysAve
) asaverage
andstandard_deviation
. - Note: ignore warnings about implicit NAs. This warning will not prevent your code from running or being graded correctly.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
library(dplyr)
library(NHANES)
data(NHANES)
##complete the line with group_by and summarize
NHANES %>%
filter(Gender == "female") %>%