K-means: Elbow analysis
In the previous exercises you used the dendrogram to propose a clustering that generated 3 trees. In this exercise you will leverage the k-means elbow plot to propose the "best" number of clusters.
This exercise is part of the course
Cluster Analysis in R
Exercise instructions
- Use
map_dbl()
to runkmeans()
using theoes
data for k values ranging from 1 to 10 and extract the total within-cluster sum of squares value from each model:model$tot.withinss
. Store the resulting vector astot_withinss
. - Build a new data frame
elbow_df
containing the values of k and the vector of total within-cluster sum of squares. - Use the values in
elbow_df
to plot a line plot showing the relationship between k and total within-cluster sum of squares.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Use map_dbl to run many models with varying value of k (centers)
tot_withinss <- map_dbl(1:10, function(k){
model <- kmeans(x = ___, centers = ___)
model$tot.withinss
})
# Generate a data frame containing both k and tot_withinss
elbow_df <- data.frame(
k = ___,
tot_withinss = ___
)
# Plot the elbow plot
ggplot(elbow_df, aes(x = ___, y = ___)) +
geom_line() +
scale_x_continuous(breaks = 1:10)