Determine the optimal number of clusters
Here, you will use the elbow criterion method to identify the optimal number of clusters where the squared sum of error decrease becomes marginal. This is an important step to get a mathematical ball-park number of clusters to start testing. You will iterate through multiple k number of clusters and run a KMeans algorithm for each, then plot the errors against each k to identify the "elbow" where the decrease in errors slows downs.
The KMeans module is loaded from sklearn.cluster, the seaborn library is loaded as sns, and the matplotlib.pyplot module is loaded as plt. Also, the scaled dataset is loaded as wholesale_scaled_df as a pandas DataFrame.
Diese Übung ist Teil des Kurses
Machine Learning for Marketing in Python
Anleitung zur Übung
- Create an empty
ssedictionary. - Fit a
KMeansalgorithm on k values between 1 and 11 and store the errors in thessedictionary. - Add the title to the plot.
- Create a scatter plot with keys on X-axis and values on the Y-axis and display the chart.
Interaktive Übung
Vervollständige den Beispielcode, um diese Übung erfolgreich abzuschließen.
# Create empty sse dictionary
sse = {}
# Fit KMeans algorithm on k values between 1 and 11
for k in ___(1, 11):
kmeans = ___(n_clusters=___, random_state=333)
kmeans.___(wholesale_scaled_df)
sse[k] = kmeans.inertia_
# Add the title to the plot
plt.___('Elbow criterion method chart')
# Create and display a scatter plot
sns.pointplot(x=list(sse.___()), y=list(sse.___()))
plt.___()