PCA with R
Principal Component Analysis (PCA) can be performed by two sightly different matrix decomposition methods from linear algebra: the Eigenvalue Decomposition and the Singular Value Decomposition (SVD).
There are two functions in the default package distribution of R that can be used to perform PCA: princomp()
and prcomp()
. The prcomp()
function uses the SVD and is the preferred, more numerically accurate method.
Both methods quite literally decompose a data matrix into a product of smaller matrices, which let's us extract the underlying principal components. This makes it possible to approximate a lower dimensional representation of the data by choosing only a few principal components.
This exercise is part of the course
Helsinki Open Data Science
Exercise instructions
- Create
human_std
by standardizing the variables inhuman
. - Print out summaries of the standardized variables. What are the means? Do you know the standard deviations? (hint:
?scale
) - Use
prcomp()
to perform principal component analysis on the standardized data. Save the results in the objectpca_human
- Use
biplot()
to draw a biplot ofpca_human
(Click next to "Plots" to view it larger) - Experiment with the argument
cex
ofbiplot()
. It should be a vector of length 2 and it can be used to scale the labels in the biplot. Try for examplecex = c(0.8, 1)
. Which number affects what? - Add the argument
col = c("grey40", "deeppink2")
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# modified human is available
# standardize the variables
human_std <- scale(human)
# print out summaries of the standardized variables
# perform principal component analysis (with the SVD method)
pca_human <- prcomp(human_std)
# draw a biplot of the principal component representation and the original variables
biplot(pca_human, choices = 1:2)