1. Visualizing and interpreting PCA results
Great work on making your first PCA model. In this video, we will explore some additional visualizations often used to understand PCA models.
2. Biplot
The first of the two visualizations is known as a biplot. This plot shows all of the original observations as points plotted in the first two principal components. A biplot also shows the original features as vectors mapped onto the first two principal components.
In this case, the original features of petal length and petal width are in the same direction in the first two principal components, indicating that these two features are correlated in the original data.
3. Scree plot
The second type of plot is a scree plot. Scree plots for PCA either show the proportion of variance explained by each principal component, as on the left hand side here. Or they show the cumulative percentage of variance explained as the number of principal components increases (until all of the original variance is explained when the number of principal components equals the number of features in the original data). This is shown on the right hand side.
4. Biplots in R
Creating a biplot in R from the results of 'prcomp' only requires passing the PCA model into the function 'biplot'.
5. Scree plots in R
Building up the scree plots requires a few additional steps. First, the standard deviations of each principle component is accessed through the 'sdev' component of the PCA model. Because we want variance instead of standard deviation -- and variance is defined as the square of standard deviation -- we must take the square of each element of 'sdev'.
Finally, the proportion of variance for each principle component is determined by dividing by the total variance explained. This is then plotted using R's base plot,
6. Scree plot
or your favorite plotting function.
7. Let's practice!
Ok, let's get started creating some visualizations to help interpret PCA models. We will guide you along the way.