Handling random algorithms
In the video, you saw how kmeans()
randomly initializes the centers of clusters. This random initialization can result in assigning observations to different cluster labels. Also, the random initialization can result in finding different local minima for the k-means algorithm. This exercise will demonstrate both results.
At the top of each plot, the measure of model quality—total within cluster sum of squares error—will be plotted. Look for the model(s) with the lowest error to find models with the better model results.
Because kmeans()
initializes observations to random clusters, it is important to set the random number generator seed for reproducibility.
This exercise is part of the course
Unsupervised Learning in R
Exercise instructions
The data, x
, is still available in your workspace. Your task is to generate six kmeans()
models on the data, plotting the results of each, in order to see the impact of random initializations on model results.
- Set the random number seed to 1 with
set.seed()
. - For each iteration of the
for
loop, runkmeans()
onx
. Assume the number of clusters is 3 and number of starts (nstart
) is 1. - Visualize the cluster memberships using the
col
argument toplot()
. Observe how the measure of quality and cluster assignments vary among the six model runs.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Set up 2 x 3 plotting grid
par(mfrow = c(2, 3))
# Set seed
set.seed(___)
for(i in 1:6) {
# Run kmeans() on x with three clusters and one start
km.out <- kmeans(___, ___, ___)
# Plot clusters
plot(x, col = ___,
main = km.out$tot.withinss,
xlab = "", ylab = "")
}