Get startedGet started for free

Interpreting PCA attributes

1. Interpreting PCA outputs

Now, that we know how to implement the princomp() function to calculate the PCs and choose the appropriate number of components, we will explore the contents of the principal component object.

2. Attributes of princomp object

Recall, that we created the cars dot pca object by using the princomp() function on the mtcars dot sub dataset with cor and scores equal to TRUE. Applying the attributes function to cars dot pca reveals lots of information about the contents of the object, including loadings, means, and scores.

3. Interpretation of loadings

In particular, cars dot pca dollar sign loadings, or the loadings function with cars dot pca in parentheses, will show the weights that were used to construct the PCs. Note the blank entries in the data frame are values that are close to zero and have been omitted to provide a cleaner interpretation, which will be discussed later.

4. Geometry of loadings - numerical values

If we choose to retain the first two PCs, we can print the first two columns of cars dot pca dollar sign loadings. We will now plot the PC loadings and provide a more detailed interpretation of the weights.

5. Geometry of loadings - plot

The biplot() function simultaneously shows the loadings and the scores on the first two components. To highlight the loadings and suppress the scores, we specify the arguments col and cex, so that only the loadings occur in a brighter color and a bigger font compared to the scores. Based on the graphical and numerical values we can interpret the result as follows. Component 1 is a contrast between mpg, drat, qsec, and gear, which have positive loadings, and the rest of the 5 variables, which have negative loadings. Component 2 has very small loadings, which appear as blanks in the loading matrix, on the first three variables. So, it is a weighted sum of the other six variables with negative weights on all of them.

6. PCA scores

The scores are the projection of the original dataset on the principal components. For the mtcars dot sub dataset the original dimension was 9, so 9 sets of scores corresponding to each PC will be available for each observation stored in the cars dot pca dollar sign scores object.

7. PCA scores on first two components

The first two scores are widely used as a representation of the original high-dimensional data in a lower dimension as they are easy to visualize. They are extracted from the first two columns of cars dot pca dollar sign scores object.

8. Calculating, visualizing and intrepreting scores

The simplest way to visualize the scores in the first two dimensions is to use the biplot() function on cars dot pca, this time suppressing the loadings by correctly specifying the col and cex arguments.

9. Plotting scores using ggplot

The ggplot() function gives us more control over the plot of the scores. We first create a data frame with all the scores and use Comp dot 1 and Comp dot 2 to plot the scores on the first two components. We can label the points by their rownames, using the label equals rownames with scores in parentheses.

10. Plotting and coloring scores using ggplot

Using the argument color equals cylinder, we can also color the points by the cylinder variable, after first defining it to be a factor.

11. Using the factoextra library

Another way to visualize the loadings and or scores of a PCA output is to use functions from the factoextra library. Let's discuss three functions from the library, which start with the prefix fviz underscore pca.

12. Using the factoextra library

The fviz underscore pca underscore biplot() function shows the loadings and scores on the same plot with the percentage of variation explained by each of these components.

13. fviz_pca_ind function

The fviz underscore pca underscore ind() function on the other hand just plots the scores.

14. fviz_pca_var function

And the fviz underscore pca underscore var() function displays only the loadings.

15. Let's practice these functions!

Now it's your turn to use these functions to analyze the state dot x77 data.