Get startedGet started for free

A biplot of PCA

A biplot is a way of visualizing the connections between two representations of the same data. First, a simple scatter plot is drawn where the observations are represented by two principal components (PC's). Then, arrows are drawn to visualize the connections between the original variables and the PC's. The following connections hold:

  • The angle between the arrows can be interpret as the correlation between the variables.
  • The angle between a variable and a PC axis can be interpret as the correlation between the two.
  • The length of the arrows are proportional to the standard deviations of the variables

This exercise is part of the course

Helsinki Open Data Science

View Course

Exercise instructions

  • Create and print out a summary of pca_human (created in the previous exercise)
  • Create object pca_pr and print it out
  • Adjust the code: instead of proportions of variance, save the percentages of variance in the pca_pr object. Round the percentages to 1 digit.
  • Execute the paste0() function. Then create a new object pc_lab by assigning the output to it.
  • Draw the biplot again. Use the first value of the pc_lab vector as the label for the x-axis and the second value as the label for the y-axis.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# pca_human, dplyr are available

# create and print out a summary of pca_human
s <- summary(pca_human)


# rounded percetanges of variance captured by each PC
pca_pr <- round(1*s$importance[2, ], digits = 5)

# print out the percentages of variance


# create object pc_lab to be used as axis labels
paste0(names(pca_pr), " (", pca_pr, "%)")

# draw a biplot
biplot(pca_human, cex = c(0.8, 1), col = c("grey40", "deeppink2"), xlab = NA, ylab = NA)



Edit and Run Code