Elbow (Scree) plot
In the previous exercises you have calculated the total within-cluster sum of squares for values of k ranging from 1 to 10. You can visualize this relationship using a line plot to create what is known as an elbow plot (or scree plot).
When looking at an elbow plot you want to see a sharp decline from one k to another followed by a more gradual decrease in slope. The last value of k before the slope of the plot levels off suggests a "good" value of k.
This is a part of the course
“Cluster Analysis in R”
Exercise instructions
- Continuing your work from the previous exercise, use the values in
elbow_df
to plot a line plot showing the relationship between k and total within-cluster sum of squares.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Use map_dbl to run many models with varying value of k (centers)
tot_withinss <- map_dbl(1:10, function(k){
model <- kmeans(x = lineup, centers = k)
model$tot.withinss
})
# Generate a data frame containing both k and tot_withinss
elbow_df <- data.frame(
k = 1:10,
tot_withinss = tot_withinss
)
# Plot the elbow plot
ggplot(___, aes(x = ___, y = ___)) +
geom_line() +
scale_x_continuous(breaks = 1:10)