Principal Component Analysis

1. Principal Component Analysis

Great work so far! In this last lesson of chapter 2, we will review principal component analysis.

2. Principal Component Analysis

Principal component analysis is a technique to reduce the number of dimensions of your dataset. A dimension in this context means a variable or a column. PCA is usually used when there are lots of variables in a dataset, and many of them tell the same story.

3. Principal Component Analysis

PCA enables you to create a new dataset of the same size as your original dataset. The new dataset is computed so that each subsequent variable explains as much variance as possible.

4. Principal Component Analysis

In the example, the first principal component explains almost 98% of the variance,

5. Principal Component Analysis

the second over 2%,

6. Principal Component Analysis

and the third close to zero variance.

7. Principal Component Analysis

This tells us that you could use only two

8. Principal Component Analysis

or even only one variable instead of three and still capture most of the information kept in the data.

9. Principal Component Analysis

Using a smaller dataset increases the speed of your algorithms. Usually, we don't use PCA on datasets with three variables, but on datasets with tens or hundreds of them. How does PCA work?

10. PCA - rotation

Imagine that there are some data points explained by two dimensions. PCA recasts the data into a vector space where the first dimension captures the most variance; the second dimension captures the second most, and so forth.

11. PCA - rotation

If we rotate the axes, most of the variability will be within one dimension rather than two.

12. PCA - rotation

Most of the variance is now explained by the first dimension, but some is still explained by the second.

13. PCA in R

The base software for R includes two functions for principal component analysis, namely prcomp and princomp. The documentation mentions that prcomp has better numerical properties, so let's review this function.

14. PCA in R

The prcomp function returns an object of class prcomp. The predict function applied to an object of class prcomp returns the rotated data.

15. PCA in R

The summary of an object of class prcomp returns the importance of each of the components.

16. PCA in R

If you set the rank parameter, for example, to two, there will be only two principal components created.

17. PCA in R

You can also limit the number of components by setting the tol parameter. Components are omitted if their standard deviations are less than or equal to tol times the standard deviation of the first component.

18. Summary

To summarize, we covered the application of PCA, rotation of axes and how to perform PCA in R using the prcomp function.

19. Let's practice!

Let's reduce some dimensions!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.