Elbow (Scree) plot
In the previous exercises you have calculated the total within-cluster sum of squares for values of k ranging from 1 to 10. You can visualize this relationship using a line plot to create what is known as an elbow plot (or scree plot).
When looking at an elbow plot you want to see a sharp decline from one k to another followed by a more gradual decrease in slope. The last value of k before the slope of the plot levels off suggests a "good" value of k.
This is a part of the course
“Cluster Analysis in R”
Exercise instructions
- Continuing your work from the previous exercise, use the values in
elbow_df
to plot a line plot showing the relationship between k and total within-cluster sum of squares.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Use map_dbl to run many models with varying value of k (centers)
tot_withinss <- map_dbl(1:10, function(k){
model <- kmeans(x = lineup, centers = k)
model$tot.withinss
})
# Generate a data frame containing both k and tot_withinss
elbow_df <- data.frame(
k = 1:10,
tot_withinss = tot_withinss
)
# Plot the elbow plot
ggplot(___, aes(x = ___, y = ___)) +
geom_line() +
scale_x_continuous(breaks = 1:10)
This exercise is part of the course
Cluster Analysis in R
Develop a strong intuition for how hierarchical and k-means clustering work and learn how to apply them to extract insights from your data.
In this chapter, you will build an understanding of the principles behind the k-means algorithm, learn how to select the right k when it isn't previously known, and revisit the wholesale data from a different perspective.
Exercise 1: Introduction to K-meansExercise 2: K-means on a soccer fieldExercise 3: K-means on a soccer field (part 2)Exercise 4: Evaluating different values of K by eyeExercise 5: Many K's many modelsExercise 6: Elbow (Scree) plotExercise 7: Interpreting the elbow plotExercise 8: Silhouette analysis: observation level performanceExercise 9: Silhouette analysisExercise 10: Making sense of the K-means clustersExercise 11: Revisiting wholesale data: "Best" kExercise 12: Revisiting wholesale data: ExplorationWhat is DataCamp?
Learn the data skills you need online at your own pace—from non-coding essentials to data science and machine learning.