PCA with R

Principal Component Analysis (PCA) can be performed by two sightly different matrix decomposition methods from linear algebra: the Eigenvalue Decomposition and the Singular Value Decomposition (SVD).

There are two functions in the default package distribution of R that can be used to perform PCA: princomp() and prcomp(). The prcomp() function uses the SVD and is the preferred, more numerically accurate method.

Both methods quite literally decompose a data matrix into a product of smaller matrices, which let's us extract the underlying principal components. This makes it possible to approximate a lower dimensional representation of the data by choosing only a few principal components.

This exercise is part of the course

Helsinki Open Data Science

View Course

Exercise instructions

Create human_std by standardizing the variables in human.
Print out summaries of the standardized variables. What are the means? Do you know the standard deviations? (hint: ?scale)
Use prcomp() to perform principal component analysis on the standardized data. Save the results in the object pca_human
Use biplot() to draw a biplot of pca_human (Click next to "Plots" to view it larger)
Experiment with the argument cex of biplot(). It should be a vector of length 2 and it can be used to scale the labels in the biplot. Try for example cex = c(0.8, 1). Which number affects what?
Add the argument col = c("grey40", "deeppink2")

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# modified human is available

# standardize the variables
human_std <- scale(human)

# print out summaries of the standardized variables


# perform principal component analysis (with the SVD method)
pca_human <- prcomp(human_std)

# draw a biplot of the principal component representation and the original variables
biplot(pca_human, choices = 1:2)

Edit and Run Code