IniziaInizia gratis

Evaluating the grain clustering

In the previous exercise, you observed from the inertia plot that 3 is a good number of clusters for the grain data. In fact, the grain samples come from a mix of 3 different grain varieties: "Kama", "Rosa" and "Canadian". In this exercise, cluster the grain samples into three clusters, and compare the clusters to the grain varieties using a cross-tabulation.

You have the array samples of grain samples, and a list varieties giving the grain variety for each sample. Pandas (pd) and KMeans have already been imported for you.

Questo esercizio fa parte del corso

Unsupervised Learning in Python

Visualizza il corso

Istruzioni dell'esercizio

  • Create a KMeans model called model with 3 clusters.
  • Use the .fit_predict() method of model to fit it to samples and derive the cluster labels. Using .fit_predict() is the same as using .fit() followed by .predict().
  • Create a DataFrame df with two columns named 'labels' and 'varieties', using labels and varieties, respectively, for the column values. This has been done for you.
  • Use the pd.crosstab() function on df['labels'] and df['varieties'] to count the number of times each grain variety coincides with each cluster label. Assign the result to ct.
  • Hit submit to see the cross-tabulation!

Esercizio pratico interattivo

Prova a risolvere questo esercizio completando il codice di esempio.

# Create a KMeans model with 3 clusters: model
model = ____

# Use fit_predict to fit model and obtain cluster labels: labels
labels = ____

# Create a DataFrame with labels and varieties as columns: df
df = pd.DataFrame({'labels': labels, 'varieties': varieties})

# Create crosstab: ct
ct = ____

# Display ct
print(ct)
Modifica ed esegui il codice