Get Started

Elbow (Scree) plot

In the previous exercises you have calculated the total within-cluster sum of squares for values of k ranging from 1 to 10. You can visualize this relationship using a line plot to create what is known as an elbow plot (or scree plot).

When looking at an elbow plot you want to see a sharp decline from one k to another followed by a more gradual decrease in slope. The last value of k before the slope of the plot levels off suggests a "good" value of k.

This is a part of the course

“Cluster Analysis in R”

View Course

Exercise instructions

  • Continuing your work from the previous exercise, use the values in elbow_df to plot a line plot showing the relationship between k and total within-cluster sum of squares.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Use map_dbl to run many models with varying value of k (centers)
tot_withinss <- map_dbl(1:10,  function(k){
  model <- kmeans(x = lineup, centers = k)
  model$tot.withinss
})

# Generate a data frame containing both k and tot_withinss
elbow_df <- data.frame(
  k = 1:10,
  tot_withinss = tot_withinss
)

# Plot the elbow plot
ggplot(___, aes(x = ___, y = ___)) +
  geom_line() +
  scale_x_continuous(breaks = 1:10)

This exercise is part of the course

Cluster Analysis in R

IntermediateSkill Level
4.8+
13 reviews

Develop a strong intuition for how hierarchical and k-means clustering work and learn how to apply them to extract insights from your data.

In this chapter, you will build an understanding of the principles behind the k-means algorithm, learn how to select the right k when it isn't previously known, and revisit the wholesale data from a different perspective.

Exercise 1: Introduction to K-meansExercise 2: K-means on a soccer fieldExercise 3: K-means on a soccer field (part 2)Exercise 4: Evaluating different values of K by eyeExercise 5: Many K's many modelsExercise 6: Elbow (Scree) plot
Exercise 7: Interpreting the elbow plotExercise 8: Silhouette analysis: observation level performanceExercise 9: Silhouette analysisExercise 10: Making sense of the K-means clustersExercise 11: Revisiting wholesale data: "Best" kExercise 12: Revisiting wholesale data: Exploration

What is DataCamp?

Learn the data skills you need online at your own pace—from non-coding essentials to data science and machine learning.

Start Learning for Free