A biplot of PCA
A biplot is a way of visualizing the connections between two representations of the same data. First, a simple scatter plot is drawn where the observations are represented by two principal components (PC's). Then, arrows are drawn to visualize the connections between the original variables and the PC's. The following connections hold:
- The angle between the arrows can be interpret as the correlation between the variables.
- The angle between a variable and a PC axis can be interpret as the correlation between the two.
- The length of the arrows are proportional to the standard deviations of the variables
This exercise is part of the course
Helsinki Open Data Science
Exercise instructions
- Create and print out a summary of
pca_human
(created in the previous exercise) - Create object
pca_pr
and print it out - Adjust the code: instead of proportions of variance, save the percentages of variance in the
pca_pr
object. Round the percentages to 1 digit. - Execute the
paste0()
function. Then create a new objectpc_lab
by assigning the output to it. - Draw the biplot again. Use the first value of the
pc_lab
vector as the label for the x-axis and the second value as the label for the y-axis.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# pca_human, dplyr are available
# create and print out a summary of pca_human
s <- summary(pca_human)
# rounded percetanges of variance captured by each PC
pca_pr <- round(1*s$importance[2, ], digits = 5)
# print out the percentages of variance
# create object pc_lab to be used as axis labels
paste0(names(pca_pr), " (", pca_pr, "%)")
# draw a biplot
biplot(pca_human, cex = c(0.8, 1), col = c("grey40", "deeppink2"), xlab = NA, ylab = NA)