Get startedGet started for free

Towards clustering: distance measures

Similarity or dissimilarity of objects can be measured with distance measures. There are many different measures for different types of data. The most common or "normal" distance measure is Euclidean distance.

There are functions that calculate the distances in R. In this exercise, we will be using the base R's dist() function. The function creates a distance matrix that is saved as dist object. The distance matrix is usually square matrix containing the pairwise distances of the observations. So with large datasets, the computation of distance matrix is time consuming and storing the matrix might take a lot of memory.

This exercise is part of the course

Helsinki Open Data Science

View Course

Exercise instructions

  • Load the MASS package and the Boston dataset from it
  • Create dist_eu by calling the dist() function on the Boston dataset. Note that by default, the function uses Euclidean distance measure.
  • Look at the summary of the dist_eu
  • Next create object dist_man that contains the Manhattan distance matrix of the Boston dataset
  • Look at the summary of the dist_man

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# load MASS and Boston
library(MASS)
data('Boston')

# euclidean distance matrix
dist_eu <- "change me!"

# look at the summary of the distances


# manhattan distance matrix
dist_man <- "change me!"

# look at the summary of the distances

Edit and Run Code