Interpreting PCA results

Now you'll use some visualizations to better understand your PCA model. You were introduced to one of these visualizations, the biplot, in an earlier chapter.

You'll run into some common challenges with using biplots on real-world data containing a non-trivial number of observations and variables, then you'll look at some alternative visualizations. You are encouraged to experiment with additional visualizations before moving on to the next exercise.

This exercise is part of the course

Unsupervised Learning in R

View Course

Exercise instructions

The variables you created before, wisc.data, diagnosis, and wisc.pr, are still available.

  • Create a biplot of the wisc.pr data. What stands out to you about this plot? Is it easy or difficult to understand? Why?
  • Execute the code to scatter plot each observation by principal components 1 and 2, coloring the points by the diagnosis.
  • Repeat the same for principal components 1 and 3. What do you notice about these plots?

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Create a biplot of wisc.pr


# Scatter plot observations by components 1 and 2
plot(wisc.pr$___[, c(1, 2)], col = (diagnosis + 1), 
     xlab = "PC1", ylab = "PC2")

# Repeat for components 1 and 3
plot(___, col = (diagnosis + 1), 
     xlab = "PC1", ylab = "PC3")

# Do additional data exploration of your choosing below (optional)