Feature importance in clustering with ARI
Leverage the Adjusted Rand Index (ARI) to quantitatively measure the impact of each feature's removal on cluster assignments in the customer dataset you've worked with in the previous exercise, pre-loaded in X
.
The adjusted_rand_score()
function and the column_names
variable have been pre-loaded for you.
This exercise is part of the course
Explainable AI in Python
Exercise instructions
- Derive the original cluster assignments in
original_clusters
. - In the for loop, remove features one by one and save the result in
X_reduced
. - Derive the
reduced_clusters
by applying K-means onX_reduced
. - Compute the feature
importance
based on ARI between thereduced_clusters
and theoriginal_clusters
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
kmeans = KMeans(n_clusters=5, random_state=10, n_init=10).fit(X)
# Derive original clusters
original_clusters = ____
for i in range(X.shape[1]):
# Remove feature at index i
X_reduced = ____
# Derive reduced clusters
reduced_clusters = ____
# Derive feature importance
importance = ____
print(f'{column_names[i]}: {importance}')