K-means clustering
K-means is maybe the most used and known clustering method. It is an unsupervised method, that assigns observations to groups or clusters based on similarity of the objects. In the previous exercise we got a hang of distances. The kmeans()
function counts the distance matrix automatically, but it is good to know the basics. Let's cluster a bit!
This exercise is part of the course
Helsinki Open Data Science
Exercise instructions
- First change the centers in the
kmeans()
function to be4
and execute the clustering code - Plot the Boston data with
pairs()
. Adjust the code by adding thecol
argument. Set the color based on the clusters that k-means produced. You can access the cluster numbers withkm$cluster
. What variables do seem to effect the clustering results? Note: Withpairs()
you can reduce the number of pairs to see the plots more clearly. On line 7, just replaceBoston
with for exampleBoston[6:10]
to pair up 5 columns (columns 6 to 10). - Try a different number of clusters:
1
,2
and3
(leave it to3
). Visualize the results.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Boston dataset is available
# k-means clustering
km <-kmeans(Boston, centers = "change me!")
# plot the Boston dataset with clusters
pairs(Boston, col = "change me!")