Get startedGet started for free

Exploring a data frame

Often the most interesting feature of your data are the relationships between the variables. If there are only a handful of variables saved as columns in a data frame, it is possible to visualize all of these relationships neatly in a single plot.

Base R offers a fast plotting function pairs(), which draws all possible scatter plots from the columns of a data frame, resulting in a scatter plot matrix. Libraries GGally and ggplot2 together offer a slow but more detailed look at the variables, their distributions and relationships.

This exercise is part of the course

Helsinki Open Data Science

View Course

Exercise instructions

  • Draw a scatter matrix of the variables in learning2014 (other than gender)
  • Adjust the code: Add the argument col to the pairs() function, defining the colour with the 'gender' variable in learning2014.
  • Draw the plot again to see the changes.
  • Access the ggpot2 and GGally libraries and create the plot p with ggpairs().
  • Draw the plot. Note that the function is a bit slow.
  • Adjust the argument mapping of ggpairs() by defining col = gender inside aes().
  • Draw the plot again.
  • Adjust the code a little more: add another aesthetic element alpha = 0.3 inside aes().
  • See the difference between the plots?

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# learning2014 is available

# draw a scatter plot matrix of the variables in learning2014.
# [-1] excludes the first column (gender)
pairs(learning2014[-1])

# access the GGally and ggplot2 libraries
library(GGally)
library(ggplot2)

# create a more advanced plot matrix with ggpairs()
p <- ggpairs(learning2014, mapping = aes(), lower = list(combo = wrap("facethist", bins = 20)))

# draw the plot

Edit and Run Code