Exploring a data frame
Often the most interesting feature of your data are the relationships between the variables. If there are only a handful of variables saved as columns in a data frame, it is possible to visualize all of these relationships neatly in a single plot.
Base R offers a fast plotting function pairs()
, which draws all possible scatter plots from the columns of a data frame, resulting in a scatter plot matrix. Libraries GGally and ggplot2 together offer a slow but more detailed look at the variables, their distributions and relationships.
This exercise is part of the course
Helsinki Open Data Science
Exercise instructions
- Draw a scatter matrix of the variables in learning2014 (other than gender)
- Adjust the code: Add the argument
col
to thepairs()
function, defining the colour with the 'gender' variable in learning2014. - Draw the plot again to see the changes.
- Access the ggpot2 and GGally libraries and create the plot
p
withggpairs()
. - Draw the plot. Note that the function is a bit slow.
- Adjust the argument
mapping
ofggpairs()
by definingcol = gender
insideaes()
. - Draw the plot again.
- Adjust the code a little more: add another aesthetic element
alpha = 0.3
insideaes()
. - See the difference between the plots?
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# learning2014 is available
# draw a scatter plot matrix of the variables in learning2014.
# [-1] excludes the first column (gender)
pairs(learning2014[-1])
# access the GGally and ggplot2 libraries
library(GGally)
library(ggplot2)
# create a more advanced plot matrix with ggpairs()
p <- ggpairs(learning2014, mapping = aes(), lower = list(combo = wrap("facethist", bins = 20)))
# draw the plot