Feature impact on cluster quality
Explore how individual features impact the clustering performance of a KMeans model. The dataset X
is used for customer segmentation based on three features: income, number of kids, and number of teens in the house.
The silhouette_score
function and the column_names
variable have been pre-loaded for you.
This exercise is part of the course
Explainable AI in Python
Exercise instructions
- Derive the original silhouette score (
original_score
). - In the for loop, remove features one by one and save the result in
X_reduced
. - Compute the new silhouette score (
new_score
). - Compute the
impact
of the feature.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
kmeans = KMeans(n_clusters=5, random_state=10, n_init=10).fit(X)
# Derive the original silhouette score
original_score = ____
for i in range(X.shape[1]):
# Remove feature at index i
X_reduced = ____
kmeans.fit(X_reduced)
# Compute the new silhouette score
new_score = ____
# Compute the feature's impact
impact = ____
print(f'Feature {column_names[i]}: Impact = {impact}')