Get startedGet started for free

Many K's many models

While the lineup dataset clearly has a known value of k, often times the optimal number of clusters isn't known and must be estimated.

In this exercise you will leverage map_dbl() from the purrr library to run k-means using values of k ranging from 1 to 10 and extract the total within-cluster sum of squares metric from each one. This will be the first step towards visualizing the elbow plot.

This exercise is part of the course

Cluster Analysis in R

View Course

Exercise instructions

  • Use map_dbl() to run kmeans() using the lineup data for k values ranging from 1 to 10 and extract the total within-cluster sum of squares value from each model: model$tot.withinss. Store the resulting vector as tot_withinss.
  • Build a new data frame elbow_df containing the values of k and the vector of total within-cluster sum of squares.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

library(purrr)

# Use map_dbl to run many models with varying value of k (centers)
tot_withinss <- map_dbl(1:10,  function(k){
  model <- kmeans(x = ___, centers = ___)
  model$tot.withinss
})

# Generate a data frame containing both k and tot_withinss
elbow_df <- data.frame(
  k = ___ ,
  tot_withinss = ___
)
Edit and Run Code