Aan de slagGa gratis aan de slag

Extracting the cluster labels

In the previous exercise, you saw that the intermediate clustering of the grain samples at height 6 has 3 clusters. Now, use the fcluster() function to extract the cluster labels for this intermediate clustering, and compare the labels with the grain varieties using a cross-tabulation.

The hierarchical clustering has already been performed and mergings is the result of the linkage() function. The list varieties gives the variety of each grain sample.

Deze oefening maakt deel uit van de cursus

Unsupervised Learning in Python

Cursus bekijken

Oefeninstructies

  • Import:
    • pandas as pd.
    • fcluster from scipy.cluster.hierarchy.
  • Perform a flat hierarchical clustering by using the fcluster() function on mergings. Specify a maximum height of 6 and the keyword argument criterion='distance'.
  • Create a DataFrame df with two columns named 'labels' and 'varieties', using labels and varieties, respectively, for the column values. This has been done for you.
  • Create a cross-tabulation ct between df['labels'] and df['varieties'] to count the number of times each grain variety coincides with each cluster label.

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

# Perform the necessary imports
import ____ as ____
from ____ import ____

# Use fcluster to extract labels: labels
labels = ____

# Create a DataFrame with labels and varieties as columns: df
df = pd.DataFrame({'labels': labels, 'varieties': varieties})

# Create crosstab: ct
ct = ____

# Display ct
print(ct)
Code bewerken en uitvoeren