Get startedGet started for free

Intro to the Idea of PCA

1. Principal Component Analysis

Congratulations! Now that you've completed the first three chapters, you'll get a chance to put the ideas together via principal component analysis, which is one of the most common techniques used in dimension reduction in data science and machine learning.

2. Big Data

Data used to solve real-world problems, like the problem of acquiring college players to play American football in the National Football League, often has not only a large number of rows (which is what we think of when we think of "big data", but also a great deal of columns. These columns can represent "features" in a machine learning model, for example, height and weight. Each row represents a player, with name omitted.

3. Big Data - Redundancy

While the adage that "the more data the better" is largely true when thinking about rows, it's not always the case with columns. For example, if two variables are strongly correlated, they may be measuring the same phenomenon. We don't want our models double counting the same thing. Here, how fast a football player runs 20 yards is basically the same variable as how he runs 40 yards.

4. Principal Component Analysis

PCA is one of the more-useful methods from applied linear algebra but is often not taught in an introductory linear algebra course at the university or college level. It's a non-parametric way of extracting meaningful information from confusing data sets, meaning that you do not have to set the structure of your data analysis a priori, the method does it for you. PCA uncovers hidden, low-dimensional structures that underlie your data, like the fact that how fast a football player runs 20 yards is a very similar variable to how fast he runs 40 yards. These structures (for example, two-dimensional approximations of four-dimensional data) are more-easily visualized and are often interpretable to content experts.

5. Principal Component Analysis - Motivating Example

Coming back to our example of facial recognition, PCA can help us take grainy pixel data, reduce it to a less-grainy version that maintains the essential features of the photograph and store that smaller set of data for future use.

6. Let's practice!

Let's explore a few of these ideas with some exercises!

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.