Silhouette analysis
Silhouette analysis allows you to calculate how similar each observations is with the cluster it is assigned relative to other clusters. This metric (silhouette width) ranges from -1 to 1 for each observation in your data and can be interpreted as follows:
- Values close to 1 suggest that the observation is well matched to the assigned cluster
- Values close to 0 suggest that the observation is borderline matched between two clusters
- Values close to -1 suggest that the observations may be assigned to the wrong cluster
In this exercise you will leverage the pam()
and the silhouette()
functions from the cluster
library to perform silhouette analysis to compare the results of models with a k of 2 and a k of 3. You'll continue working with the lineup
dataset.
Pay close attention to the silhouette plot, does each observation clearly belong to its assigned cluster for k = 3?
This is a part of the course
“Cluster Analysis in R”
Exercise instructions
- Generate a k-means model
pam_k2
usingpam()
withk = 2
on thelineup
data. - Plot the silhouette analysis using
plot(silhouette(model))
. - Repeat the first two steps for
k = 3
, saving the model aspam_k3
. - Make sure to review the differences between the plots before proceeding (especially observation 3) for
pam_k3
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
library(cluster)
# Generate a k-means model using the pam() function with a k = 2
pam_k2 <- pam(___, k = ___)
# Plot the silhouette visual for the pam_k2 model
plot(silhouette(___))
# Generate a k-means model using the pam() function with a k = 3
pam_k3 <- ___
# Plot the silhouette visual for the pam_k3 model
This exercise is part of the course
Cluster Analysis in R
Develop a strong intuition for how hierarchical and k-means clustering work and learn how to apply them to extract insights from your data.
In this chapter, you will build an understanding of the principles behind the k-means algorithm, learn how to select the right k when it isn't previously known, and revisit the wholesale data from a different perspective.
Exercise 1: Introduction to K-meansExercise 2: K-means on a soccer fieldExercise 3: K-means on a soccer field (part 2)Exercise 4: Evaluating different values of K by eyeExercise 5: Many K's many modelsExercise 6: Elbow (Scree) plotExercise 7: Interpreting the elbow plotExercise 8: Silhouette analysis: observation level performanceExercise 9: Silhouette analysisExercise 10: Making sense of the K-means clustersExercise 11: Revisiting wholesale data: "Best" kExercise 12: Revisiting wholesale data: ExplorationWhat is DataCamp?
Learn the data skills you need online at your own pace—from non-coding essentials to data science and machine learning.