Practical issues: scaling

You saw in the video that scaling your data before doing PCA changes the results of the PCA modeling. Here, you will perform PCA with and without scaling, then visualize the results using biplots.

Sometimes scaling is appropriate when the variances of the variables are substantially different. This is commonly the case when variables have different units of measurement, for example, degrees Fahrenheit (temperature) and miles (distance). Making the decision to use scaling is an important step in performing a principal component analysis.

The same Pokemon dataset is available in your workspace as pokemon, but one new variable has been added: Total.

There is some code at the top of the editor to calculate the mean and standard deviation of each variable in the model. Run this code to see how the scale of the variables differs in the original data.
Create a PCA model of pokemon with scaling, assigning the result to pr.with.scaling.
Create a PCA model of pokemon without scaling, assigning the result to pr.without.scaling.
Use biplot() to plot both models (one at a time) and compare their outputs.

Unsupervised learning in R

Hierarchical clustering

Dimensionality reduction with PCA

Putting it all together with a case study

Exercise

Practical issues: scaling

Instructions