Handling random algorithms

In the video, you saw how kmeans() randomly initializes the centers of clusters. This random initialization can result in assigning observations to different cluster labels. Also, the random initialization can result in finding different local minima for the k-means algorithm. This exercise will demonstrate both results.

At the top of each plot, the measure of model quality—total within cluster sum of squares error—will be plotted. Look for the model(s) with the lowest error to find models with the better model results.

Because kmeans() initializes observations to random clusters, it is important to set the random number generator seed for reproducibility.

Este exercício faz parte do curso

Unsupervised Learning in R

Ver curso

Instruções do exercício

The data, x, is still available in your workspace. Your task is to generate six kmeans() models on the data, plotting the results of each, in order to see the impact of random initializations on model results.

Set the random number seed to 1 with set.seed().
For each iteration of the for loop, run kmeans() on x. Assume the number of clusters is 3 and number of starts (nstart) is 1.
Visualize the cluster memberships using the col argument to plot(). Observe how the measure of quality and cluster assignments vary among the six model runs.

Exercício interativo prático

Experimente este exercício completando este código de exemplo.

# Set up 2 x 3 plotting grid
par(mfrow = c(2, 3))

# Set seed
set.seed(___)

for(i in 1:6) {
  # Run kmeans() on x with three clusters and one start
  km.out <- kmeans(___, ___, ___)
  
  # Plot clusters
  plot(x, col = ___, 
       main = km.out$tot.withinss, 
       xlab = "", ylab = "")
}

Editar e executar o código