ComenzarEmpieza gratis

Exploring a data frame

Often the most interesting feature of your data are the relationships between the variables. If there are only a handful of variables saved as columns in a data frame, it is possible to visualize all of these relationships neatly in a single plot.

Base R offers a fast plotting function pairs(), which draws all possible scatter plots from the columns of a data frame, resulting in a scatter plot matrix. Libraries GGally and ggplot2 together offer a slow but more detailed look at the variables, their distributions and relationships.

Este ejercicio forma parte del curso

Helsinki Open Data Science

Ver curso

Instrucciones del ejercicio

  • Draw a scatter matrix of the variables in learning2014 (other than gender)
  • Adjust the code: Add the argument col to the pairs() function, defining the colour with the 'gender' variable in learning2014.
  • Draw the plot again to see the changes.
  • Access the ggpot2 and GGally libraries and create the plot p with ggpairs().
  • Draw the plot. Note that the function is a bit slow.
  • Adjust the argument mapping of ggpairs() by defining col = gender inside aes().
  • Draw the plot again.
  • Adjust the code a little more: add another aesthetic element alpha = 0.3 inside aes().
  • See the difference between the plots?

Ejercicio interactivo práctico

Prueba este ejercicio y completa el código de muestra.

# learning2014 is available

# draw a scatter plot matrix of the variables in learning2014.
# [-1] excludes the first column (gender)
pairs(learning2014[-1])

# access the GGally and ggplot2 libraries
library(GGally)
library(ggplot2)

# create a more advanced plot matrix with ggpairs()
p <- ggpairs(learning2014, mapping = aes(), lower = list(combo = wrap("facethist", bins = 20)))

# draw the plot

Editar y ejecutar código