Get startedGet started for free

Extracting the cluster labels

In the previous exercise, you saw that the intermediate clustering of the grain samples at height 6 has 3 clusters. Now, use the fcluster() function to extract the cluster labels for this intermediate clustering, and compare the labels with the grain varieties using a cross-tabulation.

The hierarchical clustering has already been performed and mergings is the result of the linkage() function. The list varieties gives the variety of each grain sample.

This exercise is part of the course

Unsupervised Learning in Python

View Course

Exercise instructions

  • Import:
    • pandas as pd.
    • fcluster from scipy.cluster.hierarchy.
  • Perform a flat hierarchical clustering by using the fcluster() function on mergings. Specify a maximum height of 6 and the keyword argument criterion='distance'.
  • Create a DataFrame df with two columns named 'labels' and 'varieties', using labels and varieties, respectively, for the column values. This has been done for you.
  • Create a cross-tabulation ct between df['labels'] and df['varieties'] to count the number of times each grain variety coincides with each cluster label.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Perform the necessary imports
import ____ as ____
from ____ import ____

# Use fcluster to extract labels: labels
labels = ____

# Create a DataFrame with labels and varieties as columns: df
df = pd.DataFrame({'labels': labels, 'varieties': varieties})

# Create crosstab: ct
ct = ____

# Display ct
print(ct)
Edit and Run Code