PCA with R
Principal Component Analysis (PCA) can be performed by two sightly different matrix decomposition methods from linear algebra: the Eigenvalue Decomposition and the Singular Value Decomposition (SVD).
There are two functions in the default package distribution of R that can be used to perform PCA: princomp() and prcomp(). The prcomp() function uses the SVD and is the preferred, more numerically accurate method.
Both methods quite literally decompose a data matrix into a product of smaller matrices, which let's us extract the underlying principal components. This makes it possible to approximate a lower dimensional representation of the data by choosing only a few principal components.
This exercise is part of the course
Helsinki Open Data Science
Exercise instructions
- Create human_stdby standardizing the variables inhuman.
- Print out summaries of the standardized variables. What are the means? Do you know the standard deviations? (hint: ?scale)
- Use prcomp()to perform principal component analysis on the standardized data. Save the results in the objectpca_human
- Use biplot()to draw a biplot ofpca_human(Click next to "Plots" to view it larger)
- Experiment with the argument cexofbiplot(). It should be a vector of length 2 and it can be used to scale the labels in the biplot. Try for examplecex = c(0.8, 1). Which number affects what?
- Add the argument col = c("grey40", "deeppink2")
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# modified human is available
# standardize the variables
human_std <- scale(human)
# print out summaries of the standardized variables
# perform principal component analysis (with the SVD method)
pca_human <- prcomp(human_std)
# draw a biplot of the principal component representation and the original variables
biplot(pca_human, choices = 1:2)