The pipe: summarising by group
The pipe operator, %>%
, takes the result of the left-hand side and uses it as the first argument of the function on the right-hand side. For example:
1:10 %>% mean() # 5.5
The parenthesis of the 'target' function (here mean) can be dropped unless one wants to specify more arguments for it.
1:10 %>% mean # 5.5
Chaining operations with the pipe is great fun, so let's try it!
Utilizing the pipe, you'll apply the functions group_by()
and summarise()
on your data. The first one splits the data to groups according to a grouping variable (a factor, for example). The latter can be combined with any summary function such as mean()
, min()
, max()
to summarize the data.
This exercise is part of the course
Helsinki Open Data Science
Exercise instructions
- Access the tidyverse libraries dplyr and ggplot2
- Execute the sample code to see the counts of males and females in the data
- Adjust the code to calculate means of the grades of the students: inside
summarise()
, after the definition ofcount
, definemean_grade
by usingmean()
on the variableG3
. - Adjust the code: After
sex
, addhigh_use
as another grouping variable. Execute the code again.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# alc is available
# access the tidyverse libraries dplyr and ggplot2
library(dplyr); library(ggplot2)
# produce summary statistics by group
alc %>% group_by(sex) %>% summarise(count = n())