K-means clustering

K-means is maybe the most used and known clustering method. It is an unsupervised method, that assigns observations to groups or clusters based on similarity of the objects. In the previous exercise we got a hang of distances. The kmeans() function counts the distance matrix automatically, but it is good to know the basics. Let's cluster a bit!

Este ejercicio forma parte del curso

Helsinki Open Data Science

Ver curso

Instrucciones del ejercicio

First change the centers in the kmeans() function to be 4 and execute the clustering code
Plot the Boston data with pairs(). Adjust the code by adding the col argument. Set the color based on the clusters that k-means produced. You can access the cluster numbers with km$cluster. What variables do seem to effect the clustering results? Note: With pairs() you can reduce the number of pairs to see the plots more clearly. On line 7, just replace Boston with for example Boston[6:10] to pair up 5 columns (columns 6 to 10).
Try a different number of clusters: 1, 2 and 3 (leave it to 3). Visualize the results.

Ejercicio interactivo práctico

Prueba este ejercicio y completa el código de muestra.

# Boston dataset is available

# k-means clustering
km <-kmeans(Boston, centers = "change me!")

# plot the Boston dataset with clusters
pairs(Boston, col = "change me!")

Editar y ejecutar código