Towards clustering: distance measures
Similarity or dissimilarity of objects can be measured with distance measures. There are many different measures for different types of data. The most common or "normal" distance measure is Euclidean distance.
There are functions that calculate the distances in R. In this exercise, we will be using the base R's dist()
function. The function creates a distance matrix that is saved as dist object. The distance matrix is usually square matrix containing the pairwise distances of the observations. So with large datasets, the computation of distance matrix is time consuming and storing the matrix might take a lot of memory.
This exercise is part of the course
Helsinki Open Data Science
Exercise instructions
- Load the MASS package and the
Boston
dataset from it - Create
dist_eu
by calling thedist()
function on the Boston dataset. Note that by default, the function uses Euclidean distance measure. - Look at the summary of the
dist_eu
- Next create object
dist_man
that contains the Manhattan distance matrix of the Boston dataset - Look at the summary of the
dist_man
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# load MASS and Boston
library(MASS)
data('Boston')
# euclidean distance matrix
dist_eu <- "change me!"
# look at the summary of the distances
# manhattan distance matrix
dist_man <- "change me!"
# look at the summary of the distances