Clustering the fish data
You'll now use your standardization and clustering pipeline from the previous exercise to cluster the fish by their measurements, and then create a cross-tabulation to compare the cluster labels with the fish species.
As before, samples is the 2D array of fish measurements. Your pipeline is available as pipeline, and the species of every fish sample is given by the list species.
This exercise is part of the course
Unsupervised Learning in Python
Exercise instructions
- Import
pandasaspd. - Fit the pipeline to the fish measurements
samples. - Obtain the cluster labels for
samplesby using the.predict()method ofpipeline. - Using
pd.DataFrame(), create a DataFramedfwith two columns named'labels'and'species', usinglabelsandspecies, respectively, for the column values. - Using
pd.crosstab(), create a cross-tabulationctofdf['labels']anddf['species'].
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Import pandas
import pandas as pd
# Fit the pipeline to samples
____
# Calculate the cluster labels: labels
labels = ____
# Create a DataFrame with labels and species as columns: df
df = ____
# Create crosstab: ct
ct = ____
# Display ct
print(ct)