Clustering the fish data
You'll now use your standardization and clustering pipeline from the previous exercise to cluster the fish by their measurements, and then create a cross-tabulation to compare the cluster labels with the fish species.
As before, samples
is the 2D array of fish measurements. Your pipeline is available as pipeline
, and the species of every fish sample is given by the list species
.
This exercise is part of the course
Unsupervised Learning in Python
Exercise instructions
- Import
pandas
aspd
. - Fit the pipeline to the fish measurements
samples
. - Obtain the cluster labels for
samples
by using the.predict()
method ofpipeline
. - Using
pd.DataFrame()
, create a DataFramedf
with two columns named'labels'
and'species'
, usinglabels
andspecies
, respectively, for the column values. - Using
pd.crosstab()
, create a cross-tabulationct
ofdf['labels']
anddf['species']
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Import pandas
import pandas as pd
# Fit the pipeline to samples
____
# Calculate the cluster labels: labels
labels = ____
# Create a DataFrame with labels and species as columns: df
df = ____
# Create crosstab: ct
ct = ____
# Display ct
print(ct)