Evaluating the grain clustering
In the previous exercise, you observed from the inertia plot that 3 is a good number of clusters for the grain data. In fact, the grain samples come from a mix of 3 different grain varieties: "Kama", "Rosa" and "Canadian". In this exercise, cluster the grain samples into three clusters, and compare the clusters to the grain varieties using a cross-tabulation.
You have the array samples
of grain samples, and a list varieties
giving the grain variety for each sample. Pandas (pd
) and KMeans
have already been imported for you.
This exercise is part of the course
Unsupervised Learning in Python
Exercise instructions
- Create a
KMeans
model calledmodel
with3
clusters. - Use the
.fit_predict()
method ofmodel
to fit it tosamples
and derive the cluster labels. Using.fit_predict()
is the same as using.fit()
followed by.predict()
. - Create a DataFrame
df
with two columns named'labels'
and'varieties'
, usinglabels
andvarieties
, respectively, for the column values. This has been done for you. - Use the
pd.crosstab()
function ondf['labels']
anddf['varieties']
to count the number of times each grain variety coincides with each cluster label. Assign the result toct
. - Hit submit to see the cross-tabulation!
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Create a KMeans model with 3 clusters: model
model = ____
# Use fit_predict to fit model and obtain cluster labels: labels
labels = ____
# Create a DataFrame with labels and varieties as columns: df
df = pd.DataFrame({'labels': labels, 'varieties': varieties})
# Create crosstab: ct
ct = ____
# Display ct
print(ct)