Get startedGet started for free

The pipe: summarising by group

The pipe operator, %>%, takes the result of the left-hand side and uses it as the first argument of the function on the right-hand side. For example:

1:10 %>% mean() # 5.5

The parenthesis of the 'target' function (here mean) can be dropped unless one wants to specify more arguments for it.

1:10 %>% mean # 5.5

Chaining operations with the pipe is great fun, so let's try it!

Utilizing the pipe, you'll apply the functions group_by() and summarise() on your data. The first one splits the data to groups according to a grouping variable (a factor, for example). The latter can be combined with any summary function such as mean(), min(), max() to summarize the data.

This exercise is part of the course

Helsinki Open Data Science

View Course

Exercise instructions

  • Access the tidyverse libraries dplyr and ggplot2
  • Execute the sample code to see the counts of males and females in the data
  • Adjust the code to calculate means of the grades of the students: inside summarise(), after the definition of count, define mean_grade by using mean() on the variable G3.
  • Adjust the code: After sex, add high_use as another grouping variable. Execute the code again.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# alc is available

# access the tidyverse libraries dplyr and ggplot2
library(dplyr); library(ggplot2)

# produce summary statistics by group
alc %>% group_by(sex) %>% summarise(count = n())

Edit and Run Code