Get startedGet started for free

K-means: Average Silhouette Widths

So hierarchical clustering resulting in 3 clusters and the elbow method suggests 2. In this exercise use average silhouette widths to explore what the "best" value of k should be.

This exercise is part of the course

Cluster Analysis in R

View Course

Exercise instructions

  • Use map_dbl() to run pam() using the oes data for k values ranging from 2 to 10 and extract the average silhouette width value from each model: model$silinfo$avg.width. Store the resulting vector as sil_width.
  • Build a new data frame sil_df containing the values of k and the vector of average silhouette widths.
  • Use the values in sil_df to plot a line plot showing the relationship between k and average silhouette width.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Use map_dbl to run many models with varying value of k
sil_width <- map_dbl(2:10,  function(k){
  model <- pam(___, k = ___)
  model$silinfo$avg.width
})

# Generate a data frame containing both k and sil_width
sil_df <- data.frame(
  k = ___,
  sil_width = ___
)

# Plot the relationship between k and sil_width
ggplot(___, aes(x = ___, y = ___)) +
  geom_line() +
  scale_x_continuous(breaks = 2:10)
Edit and Run Code