Towards clustering: distance measures
Similarity or dissimilarity of objects can be measured with distance measures. There are many different measures for different types of data. The most common or "normal" distance measure is Euclidean distance.
There are functions that calculate the distances in R. In this exercise, we will be using the base R's dist() function. The function creates a distance matrix that is saved as dist object. The distance matrix is usually square matrix containing the pairwise distances of the observations. So with large datasets, the computation of distance matrix is time consuming and storing the matrix might take a lot of memory.
Este ejercicio forma parte del curso
Helsinki Open Data Science
Instrucciones del ejercicio
- Load the MASS package and the
Bostondataset from it - Create
dist_euby calling thedist()function on the Boston dataset. Note that by default, the function uses Euclidean distance measure. - Look at the summary of the
dist_eu - Next create object
dist_manthat contains the Manhattan distance matrix of the Boston dataset - Look at the summary of the
dist_man
Ejercicio interactivo práctico
Prueba este ejercicio y completa el código de muestra.
# load MASS and Boston
library(MASS)
data('Boston')
# euclidean distance matrix
dist_eu <- "change me!"
# look at the summary of the distances
# manhattan distance matrix
dist_man <- "change me!"
# look at the summary of the distances