Silhouette analysis

Silhouette analysis allows you to calculate how similar each observations is with the cluster it is assigned relative to other clusters. This metric (silhouette width) ranges from -1 to 1 for each observation in your data and can be interpreted as follows:

Values close to 1 suggest that the observation is well matched to the assigned cluster
Values close to 0 suggest that the observation is borderline matched between two clusters
Values close to -1 suggest that the observations may be assigned to the wrong cluster

In this exercise you will leverage the pam() and the silhouette() functions from the cluster library to perform silhouette analysis to compare the results of models with a k of 2 and a k of 3. You'll continue working with the lineup dataset.

Pay close attention to the silhouette plot, does each observation clearly belong to its assigned cluster for k = 3?

Este ejercicio forma parte del curso

Cluster Analysis in R

Ver curso

Instrucciones del ejercicio

Generate a k-means model pam_k2 using pam() with k = 2 on the lineup data.
Plot the silhouette analysis using plot(silhouette(model)).
Repeat the first two steps for k = 3, saving the model as pam_k3.
Make sure to review the differences between the plots before proceeding (especially observation 3) for pam_k3.

Ejercicio interactivo práctico

Prueba este ejercicio y completa el código de muestra.

library(cluster)

# Generate a k-means model using the pam() function with a k = 2
pam_k2 <- pam(___, k = ___)

# Plot the silhouette visual for the pam_k2 model
plot(silhouette(___))

# Generate a k-means model using the pam() function with a k = 3
pam_k3 <- ___

# Plot the silhouette visual for the pam_k3 model

Editar y ejecutar código