Practical issues: scaling
You saw in the video that scaling your data before doing PCA changes the results of the PCA modeling. Here, you will perform PCA with and without scaling, then visualize the results using biplots.
Sometimes scaling is appropriate when the variances of the variables are substantially different. This is commonly the case when variables have different units of measurement, for example, degrees Fahrenheit (temperature) and miles (distance). Making the decision to use scaling is an important step in performing a principal component analysis.
This exercise is part of the course
Unsupervised Learning in R
Exercise instructions
The same Pokemon dataset is available in your workspace as pokemon
, but one new variable has been added: Total
.
- There is some code at the top of the editor to calculate the mean and standard deviation of each variable in the model. Run this code to see how the scale of the variables differs in the original data.
- Create a PCA model of
pokemon
with scaling, assigning the result topr.with.scaling
. - Create a PCA model of
pokemon
without scaling, assigning the result topr.without.scaling
. - Use
biplot()
to plot both models (one at a time) and compare their outputs.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Mean of each variable
colMeans(pokemon)
# Standard deviation of each variable
apply(pokemon, 2, sd)
# PCA model with scaling: pr.with.scaling
# PCA model without scaling: pr.without.scaling
# Create biplots of both for comparison