The pipe: summarising by group
The pipe operator, %>%, takes the result of the left-hand side and uses it as the first argument of the function on the right-hand side. For example:
1:10 %>% mean() # 5.5
The parenthesis of the 'target' function (here mean) can be dropped unless one wants to specify more arguments for it.
1:10 %>% mean # 5.5
Chaining operations with the pipe is great fun, so let's try it!
Utilizing the pipe, you'll apply the functions group_by() and summarise() on your data. The first one splits the data to groups according to a grouping variable (a factor, for example). The latter can be combined with any summary function such as mean(), min(), max() to summarize the data.
This exercise is part of the course
Helsinki Open Data Science
Exercise instructions
- Access the tidyverse libraries dplyr and ggplot2
- Execute the sample code to see the counts of males and females in the data
- Adjust the code to calculate means of the grades of the students: inside
summarise(), after the definition ofcount, definemean_gradeby usingmean()on the variableG3. - Adjust the code: After
sex, addhigh_useas another grouping variable. Execute the code again.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# alc is available
# access the tidyverse libraries dplyr and ggplot2
library(dplyr); library(ggplot2)
# produce summary statistics by group
alc %>% group_by(sex) %>% summarise(count = n())