Box plots by groups
Box plots are an excellent way of displaying and comparing distributions. A box plot visualizes the 25th, 50th and 75th percentiles (the box), the typical range (the whiskers) and the outliers of a variable.
The whiskers extending from the box can be computed by several techniques. The default (in base R and ggplot) is to extend them to reach to a data point that is no more than 1.5*IQR away from the box, where IQR is the inter quartile range defined as
IQR = 75th percentile - 25th percentile
Values outside the whiskers can be considered as outliers, unusually distant observations. For more information on IQR, see wikipedia.
This exercise is part of the course
Helsinki Open Data Science
Exercise instructions
- Initialize and plot of student grades (
G3
), withhigh_use
grouping the grade distributions on the x-axis. Draw the plot as a box plot. - Add an aesthetix element to the plot by defining
col = sex
insideaes()
- Define a similar (box) plot of the variable
absences
grouped byhigh_use
on the x-asis and the aestheticcol = sex
. - Add a main title to the last plot with
ggtitle("title here")
. Use "Student absences by alcohol consumption and sex" as a title. - Does high use of alcohol have a connection to school absences?
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
library(ggplot2)
# initialize a plot of high_use and G3
g1 <- ggplot(alc, aes(x = high_use, y = G3))
# define the plot as a boxplot and draw it
g1 + geom_boxplot() + ylab("grade")
# initialise a plot of high_use and absences
# define the plot as a boxplot and draw it