Get startedGet started for free

Effects of scale

You have learned that when a variable is on a larger scale than other variables in your data it may disproportionately influence the resulting distance calculated between your observations. Lets see this in action by observing a sample of data from the trees data set.

You will leverage the scale() function which by default centers & scales our column features.

Our variables are the following:

  • Girth - tree diameter in inches
  • Height - tree height in inches

This exercise is part of the course

Cluster Analysis in R

View Course

Exercise instructions

  • Calculate the distance matrix for the data frame three_trees and store it as dist_trees.
  • Create a new variable scaled_three_trees where the three_trees data is centered & scaled.
  • Calculate and print the distance matrix for scaled_three_trees and store this as dist_scaled_trees.
  • Output both dist_trees and dist_scaled_trees matrices and observe the change of which observations have the smallest distance between the two matrices (hint: they have changed).

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Calculate distance for three_trees 
dist_trees <- ___

# Scale three trees & calculate the distance  
scaled_three_trees <- ___
dist_scaled_trees <- ___

# Output the results of both Matrices
print('Without Scaling')
___

print('With Scaling')
___
Edit and Run Code