Calculate and plot sum of squared errors
Now, you will calculate the sum of squared errors for different number of clusters ranging from 1 to 10.
You will use the normalized RFMT data that you created in the previous exercise, it is stored as datamart_rfmt_normalized
. The KMeans
module from scikit-learn
is also imported. Also, we have initialized an empty dictionary to store sum of squared errors as sse = {}
.
Feel free to explore the date in the console.
This exercise is part of the course
Customer Segmentation in Python
Exercise instructions
- Initialize KMeans with
k
clusters and random state 1 and fit KMeans on the normalized dataset. - Assign sum of squared distances to
k
element ofsse
dictionary. - Add the plot title "The Elbow Method", X-axis label "k", and Y-axis label "SSE".
- Plot SSE values for each
k
stored as keys in the dictionary.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Fit KMeans and calculate SSE for each k between 1 and 10
for k in range(1, 11):
# Initialize KMeans with k clusters and fit it
kmeans = ____(____=____, ____=1 ).____(datamart_rfmt_normalized)
# Assign sum of squared distances to k element of the sse dictionary
____[____] = kmeans.____
# Add the plot title, x and y axis labels
plt.____('The Elbow Method')
plt.____('____')
plt.____('____')
# Plot SSE values for each k stored as keys in the dictionary
sns.____(x=list(sse.____()), y=list(sse.____()))
plt.show()