K-means: Average Silhouette Widths
So hierarchical clustering resulting in 3 clusters and the elbow method suggests 2. In this exercise use average silhouette widths to explore what the "best" value of k should be.
This exercise is part of the course
Cluster Analysis in R
Exercise instructions
- Use
map_dbl()
to runpam()
using theoes
data for k values ranging from 2 to 10 and extract the average silhouette width value from each model:model$silinfo$avg.width
. Store the resulting vector assil_width
. - Build a new data frame
sil_df
containing the values of k and the vector of average silhouette widths. - Use the values in
sil_df
to plot a line plot showing the relationship between k and average silhouette width.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Use map_dbl to run many models with varying value of k
sil_width <- map_dbl(2:10, function(k){
model <- pam(___, k = ___)
model$silinfo$avg.width
})
# Generate a data frame containing both k and sil_width
sil_df <- data.frame(
k = ___,
sil_width = ___
)
# Plot the relationship between k and sil_width
ggplot(___, aes(x = ___, y = ___)) +
geom_line() +
scale_x_continuous(breaks = 2:10)