Evaluating the grain clustering
In the previous exercise, you observed from the inertia plot that 3 is a good number of clusters for the grain data. In fact, the grain samples come from a mix of 3 different grain varieties: "Kama", "Rosa" and "Canadian". In this exercise, cluster the grain samples into three clusters, and compare the clusters to the grain varieties using a cross-tabulation.
You have the array samples of grain samples, and a list varieties giving the grain variety for each sample. Pandas (pd) and KMeans have already been imported for you.
Bu egzersiz
Unsupervised Learning in Python
kursunun bir parçasıdırEgzersiz talimatları
- Create a
KMeansmodel calledmodelwith3clusters. - Use the
.fit_predict()method ofmodelto fit it tosamplesand derive the cluster labels. Using.fit_predict()is the same as using.fit()followed by.predict(). - Create a DataFrame
dfwith two columns named'labels'and'varieties', usinglabelsandvarieties, respectively, for the column values. This has been done for you. - Use the
pd.crosstab()function ondf['labels']anddf['varieties']to count the number of times each grain variety coincides with each cluster label. Assign the result toct. - Hit submit to see the cross-tabulation!
Uygulamalı interaktif egzersiz
Bu örnek kodu tamamlayarak bu egzersizi bitirin.
# Create a KMeans model with 3 clusters: model
model = ____
# Use fit_predict to fit model and obtain cluster labels: labels
labels = ____
# Create a DataFrame with labels and varieties as columns: df
df = pd.DataFrame({'labels': labels, 'varieties': varieties})
# Create crosstab: ct
ct = ____
# Display ct
print(ct)