Exercise

Effects of scale

You have learned that when a variable is on a larger scale than other variables in your data it may disproportionately influence the resulting distance calculated between your observations. Lets see this in action by observing a sample of data from the trees data set.

You will leverage the scale() function which by default centers & scales our column features.

Our variables are the following:

  • Girth - tree diameter in inches
  • Height - tree height in inches

Instructions

100 XP
  • Calculate the distance matrix for the data frame three_trees and store it as dist_trees.
  • Create a new variable scaled_three_trees where the three_trees data is centered & scaled.
  • Calculate and print the distance matrix for scaled_three_trees and store this as dist_scaled_trees.
  • Output both dist_trees and dist_scaled_trees matrices and observe the change of which observations have the smallest distance between the two matrices (hint: they have changed).