Feature impact on cluster quality

Explore how individual features impact the clustering performance of a KMeans model. The dataset X is used for customer segmentation based on three features: income, number of kids, and number of teens in the house.

The silhouette_score function and the column_names variable have been pre-loaded for you.

This exercise is part of the course

Explainable AI in Python

View Course

Exercise instructions

Derive the original silhouette score (original_score).
In the for loop, remove features one by one and save the result in X_reduced.
Compute the new silhouette score (new_score).
Compute the impact of the feature.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

kmeans = KMeans(n_clusters=5, random_state=10, n_init=10).fit(X)
# Derive the original silhouette score
original_score = ____

for i in range(X.shape[1]):
  	# Remove feature at index i
    X_reduced = ____
    kmeans.fit(X_reduced)
    # Compute the new silhouette score
    new_score = ____
    # Compute the feature's impact
    impact = ____
    print(f'Feature {column_names[i]}: Impact = {impact}')

Edit and Run Code