K-means clustering
K-means is maybe the most used and known clustering method. It is an unsupervised method, that assigns observations to groups or clusters based on similarity of the objects. In the previous exercise we got a hang of distances. The kmeans() function counts the distance matrix automatically, but it is good to know the basics. Let's cluster a bit!
Este ejercicio forma parte del curso
Helsinki Open Data Science
Instrucciones del ejercicio
- First change the centers in the
kmeans()function to be4and execute the clustering code - Plot the Boston data with
pairs(). Adjust the code by adding thecolargument. Set the color based on the clusters that k-means produced. You can access the cluster numbers withkm$cluster. What variables do seem to effect the clustering results? Note: Withpairs()you can reduce the number of pairs to see the plots more clearly. On line 7, just replaceBostonwith for exampleBoston[6:10]to pair up 5 columns (columns 6 to 10). - Try a different number of clusters:
1,2and3(leave it to3). Visualize the results.
Ejercicio interactivo práctico
Prueba este ejercicio y completa el código de muestra.
# Boston dataset is available
# k-means clustering
km <-kmeans(Boston, centers = "change me!")
# plot the Boston dataset with clusters
pairs(Boston, col = "change me!")