CommencerCommencer gratuitement

Practical issues: scaling

You saw in the video that scaling your data before doing PCA changes the results of the PCA modeling. Here, you will perform PCA with and without scaling, then visualize the results using biplots.

Sometimes scaling is appropriate when the variances of the variables are substantially different. This is commonly the case when variables have different units of measurement, for example, degrees Fahrenheit (temperature) and miles (distance). Making the decision to use scaling is an important step in performing a principal component analysis.

Cet exercice fait partie du cours

Unsupervised Learning in R

Afficher le cours

Instructions

The same Pokemon dataset is available in your workspace as pokemon, but one new variable has been added: Total.

  • There is some code at the top of the editor to calculate the mean and standard deviation of each variable in the model. Run this code to see how the scale of the variables differs in the original data.
  • Create a PCA model of pokemon with scaling, assigning the result to pr.with.scaling.
  • Create a PCA model of pokemon without scaling, assigning the result to pr.without.scaling.
  • Use biplot() to plot both models (one at a time) and compare their outputs.

Exercice interactif pratique

Essayez cet exercice en complétant cet exemple de code.

# Mean of each variable
colMeans(pokemon)

# Standard deviation of each variable
apply(pokemon, 2, sd)

# PCA model with scaling: pr.with.scaling


# PCA model without scaling: pr.without.scaling


# Create biplots of both for comparison

Modifier et exécuter le code