Many K's many models
While the lineup
dataset clearly has a known value of k, often times the optimal number of clusters isn't known and must be estimated.
In this exercise you will leverage map_dbl()
from the purrr
library to run k-means using values of k ranging from 1 to 10 and extract the total within-cluster sum of squares metric from each one. This will be the first step towards visualizing the elbow plot.
This exercise is part of the course
Cluster Analysis in R
Exercise instructions
- Use
map_dbl()
to runkmeans()
using thelineup
data for k values ranging from 1 to 10 and extract the total within-cluster sum of squares value from each model:model$tot.withinss
. Store the resulting vector astot_withinss
. - Build a new data frame
elbow_df
containing the values of k and the vector of total within-cluster sum of squares.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
library(purrr)
# Use map_dbl to run many models with varying value of k (centers)
tot_withinss <- map_dbl(1:10, function(k){
model <- kmeans(x = ___, centers = ___)
model$tot.withinss
})
# Generate a data frame containing both k and tot_withinss
elbow_df <- data.frame(
k = ___ ,
tot_withinss = ___
)