Get startedGet started for free

K-means: Elbow analysis

In the previous exercises you used the dendrogram to propose a clustering that generated 3 trees. In this exercise you will leverage the k-means elbow plot to propose the "best" number of clusters.

This exercise is part of the course

Cluster Analysis in R

View Course

Exercise instructions

  • Use map_dbl() to run kmeans() using the oes data for k values ranging from 1 to 10 and extract the total within-cluster sum of squares value from each model: model$tot.withinss. Store the resulting vector as tot_withinss.
  • Build a new data frame elbow_df containing the values of k and the vector of total within-cluster sum of squares.
  • Use the values in elbow_df to plot a line plot showing the relationship between k and total within-cluster sum of squares.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Use map_dbl to run many models with varying value of k (centers)
tot_withinss <- map_dbl(1:10,  function(k){
  model <- kmeans(x = ___, centers = ___)
  model$tot.withinss
})

# Generate a data frame containing both k and tot_withinss
elbow_df <- data.frame(
  k = ___,
  tot_withinss = ___
)

# Plot the elbow plot
ggplot(elbow_df, aes(x = ___, y = ___)) +
  geom_line() +
  scale_x_continuous(breaks = 1:10)
Edit and Run Code