K-means clustering

K-means is maybe the most used and known clustering method. It is an unsupervised method, that assigns observations to groups or clusters based on similarity of the objects. In the previous exercise we got a hang of distances. The kmeans() function counts the distance matrix automatically, but it is good to know the basics. Let's cluster a bit!

This exercise is part of the course

Helsinki Open Data Science

View Course

Exercise instructions

First change the centers in the kmeans() function to be 4 and execute the clustering code
Plot the Boston data with pairs(). Adjust the code by adding the col argument. Set the color based on the clusters that k-means produced. You can access the cluster numbers with km$cluster. What variables do seem to effect the clustering results? Note: With pairs() you can reduce the number of pairs to see the plots more clearly. On line 7, just replace Boston with for example Boston[6:10] to pair up 5 columns (columns 6 to 10).
Try a different number of clusters: 1, 2 and 3 (leave it to 3). Visualize the results.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Boston dataset is available

# k-means clustering
km <-kmeans(Boston, centers = "change me!")

# plot the Boston dataset with clusters
pairs(Boston, col = "change me!")

Edit and Run Code