Interpreting PCA results
Now you'll use some visualizations to better understand your PCA model. You were introduced to one of these visualizations, the biplot, in an earlier chapter.
You'll run into some common challenges with using biplots on real-world data containing a non-trivial number of observations and variables, then you'll look at some alternative visualizations. You are encouraged to experiment with additional visualizations before moving on to the next exercise.
This exercise is part of the course
Unsupervised Learning in R
Exercise instructions
The variables you created before, wisc.data
, diagnosis
, and wisc.pr
, are still available.
- Create a biplot of the
wisc.pr
data. What stands out to you about this plot? Is it easy or difficult to understand? Why? - Execute the code to scatter plot each observation by principal components 1 and 2, coloring the points by the diagnosis.
- Repeat the same for principal components 1 and 3. What do you notice about these plots?
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Create a biplot of wisc.pr
# Scatter plot observations by components 1 and 2
plot(wisc.pr$___[, c(1, 2)], col = (diagnosis + 1),
xlab = "PC1", ylab = "PC2")
# Repeat for components 1 and 3
plot(___, col = (diagnosis + 1),
xlab = "PC1", ylab = "PC3")
# Do additional data exploration of your choosing below (optional)