Extracting the cluster labels
In the previous exercise, you saw that the intermediate clustering of the grain samples at height 6 has 3 clusters. Now, use the fcluster()
function to extract the cluster labels for this intermediate clustering, and compare the labels with the grain varieties using a cross-tabulation.
The hierarchical clustering has already been performed and mergings
is the result of the linkage()
function. The list varieties
gives the variety of each grain sample.
This exercise is part of the course
Unsupervised Learning in Python
Exercise instructions
- Import:
pandas
aspd
.fcluster
fromscipy.cluster.hierarchy
.
- Perform a flat hierarchical clustering by using the
fcluster()
function onmergings
. Specify a maximum height of6
and the keyword argumentcriterion='distance'
. - Create a DataFrame
df
with two columns named'labels'
and'varieties'
, usinglabels
andvarieties
, respectively, for the column values. This has been done for you. - Create a cross-tabulation
ct
betweendf['labels']
anddf['varieties']
to count the number of times each grain variety coincides with each cluster label.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Perform the necessary imports
import ____ as ____
from ____ import ____
# Use fcluster to extract labels: labels
labels = ____
# Create a DataFrame with labels and varieties as columns: df
df = pd.DataFrame({'labels': labels, 'varieties': varieties})
# Create crosstab: ct
ct = ____
# Display ct
print(ct)