PCA with R
Principal Component Analysis (PCA) can be performed by two sightly different matrix decomposition methods from linear algebra: the Eigenvalue Decomposition and the Singular Value Decomposition (SVD).
There are two functions in the default package distribution of R that can be used to perform PCA: princomp() and prcomp(). The prcomp() function uses the SVD and is the preferred, more numerically accurate method.
Both methods quite literally decompose a data matrix into a product of smaller matrices, which let's us extract the underlying principal components. This makes it possible to approximate a lower dimensional representation of the data by choosing only a few principal components.
This exercise is part of the course
Helsinki Open Data Science
Exercise instructions
- Create
human_stdby standardizing the variables inhuman. - Print out summaries of the standardized variables. What are the means? Do you know the standard deviations? (hint:
?scale) - Use
prcomp()to perform principal component analysis on the standardized data. Save the results in the objectpca_human - Use
biplot()to draw a biplot ofpca_human(Click next to "Plots" to view it larger) - Experiment with the argument
cexofbiplot(). It should be a vector of length 2 and it can be used to scale the labels in the biplot. Try for examplecex = c(0.8, 1). Which number affects what? - Add the argument
col = c("grey40", "deeppink2")
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# modified human is available
# standardize the variables
human_std <- scale(human)
# print out summaries of the standardized variables
# perform principal component analysis (with the SVD method)
pca_human <- prcomp(human_std)
# draw a biplot of the principal component representation and the original variables
biplot(pca_human, choices = 1:2)