Get startedGet started for free

Box plots by groups

Box plots are an excellent way of displaying and comparing distributions. A box plot visualizes the 25th, 50th and 75th percentiles (the box), the typical range (the whiskers) and the outliers of a variable.

The whiskers extending from the box can be computed by several techniques. The default (in base R and ggplot) is to extend them to reach to a data point that is no more than 1.5*IQR away from the box, where IQR is the inter quartile range defined as

IQR = 75th percentile - 25th percentile

Values outside the whiskers can be considered as outliers, unusually distant observations. For more information on IQR, see wikipedia.

This exercise is part of the course

Helsinki Open Data Science

View Course

Exercise instructions

  • Initialize and plot of student grades (G3), with high_use grouping the grade distributions on the x-axis. Draw the plot as a box plot.
  • Add an aesthetix element to the plot by defining col = sex inside aes()
  • Define a similar (box) plot of the variable absences grouped by high_use on the x-asis and the aesthetic col = sex.
  • Add a main title to the last plot with ggtitle("title here"). Use "Student absences by alcohol consumption and sex" as a title.
  • Does high use of alcohol have a connection to school absences?

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

library(ggplot2)

# initialize a plot of high_use and G3
g1 <- ggplot(alc, aes(x = high_use, y = G3))

# define the plot as a boxplot and draw it
g1 + geom_boxplot() + ylab("grade")

# initialise a plot of high_use and absences


# define the plot as a boxplot and draw it

Edit and Run Code