Towards clustering: distance measures

Similarity or dissimilarity of objects can be measured with distance measures. There are many different measures for different types of data. The most common or "normal" distance measure is Euclidean distance.

There are functions that calculate the distances in R. In this exercise, we will be using the base R's dist() function. The function creates a distance matrix that is saved as dist object. The distance matrix is usually square matrix containing the pairwise distances of the observations. So with large datasets, the computation of distance matrix is time consuming and storing the matrix might take a lot of memory.

Este ejercicio forma parte del curso

Helsinki Open Data Science

Ver curso

Instrucciones del ejercicio

Load the MASS package and the Boston dataset from it
Create dist_eu by calling the dist() function on the Boston dataset. Note that by default, the function uses Euclidean distance measure.
Look at the summary of the dist_eu
Next create object dist_man that contains the Manhattan distance matrix of the Boston dataset
Look at the summary of the dist_man

Ejercicio interactivo práctico

Prueba este ejercicio y completa el código de muestra.

# load MASS and Boston
library(MASS)
data('Boston')

# euclidean distance matrix
dist_eu <- "change me!"

# look at the summary of the distances


# manhattan distance matrix
dist_man <- "change me!"

# look at the summary of the distances

Editar y ejecutar código