CommencerCommencer gratuitement

Practical matters: scaling

Recall from the video that clustering real data may require scaling the features if they have different distributions. So far in this chapter, you have been working with synthetic data that did not need scaling.

In this exercise, you will go back to working with "real" data, the pokemon dataset introduced in the first chapter. You will observe the distribution (mean and standard deviation) of each feature, scale the data accordingly, then produce a hierarchical clustering model using the complete linkage method.

Cet exercice fait partie du cours

Unsupervised Learning in R

Afficher le cours

Instructions

The data is stored in the pokemon object in your workspace.

  • Observe the mean of each variable in pokemon using the colMeans() function.
  • Observe the standard deviation of each variable using the apply() and sd() functions. Since the variables are the columns of your matrix, make sure to specify 2 as the MARGIN argument to apply().
  • Scale the pokemon data using the scale() function and store the result in pokemon.scaled.
  • Create a hierarchical clustering model of the pokemon.scaled data using the complete linkage method. Manually specify the method argument and store the result in hclust.pokemon.

Exercice interactif pratique

Essayez cet exercice en complétant cet exemple de code.

# View column means


# View column standard deviations


# Scale the data


# Create hierarchical clustering model: hclust.pokemon
Modifier et exécuter le code