Get startedGet started for free

Calculate and plot sum of squared errors

Now, you will calculate the sum of squared errors for different number of clusters ranging from 1 to 10.

You will use the normalized RFMT data that you created in the previous exercise, it is stored as datamart_rfmt_normalized. The KMeans module from scikit-learn is also imported. Also, we have initialized an empty dictionary to store sum of squared errors as sse = {}.

Feel free to explore the date in the console.

This exercise is part of the course

Customer Segmentation in Python

View Course

Exercise instructions

  • Initialize KMeans with k clusters and random state 1 and fit KMeans on the normalized dataset.
  • Assign sum of squared distances to k element of sse dictionary.
  • Add the plot title "The Elbow Method", X-axis label "k", and Y-axis label "SSE".
  • Plot SSE values for each k stored as keys in the dictionary.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Fit KMeans and calculate SSE for each k between 1 and 10
for k in range(1, 11):
  
    # Initialize KMeans with k clusters and fit it 
    kmeans = ____(____=____, ____=1 ).____(datamart_rfmt_normalized)
    
    # Assign sum of squared distances to k element of the sse dictionary
    ____[____] = kmeans.____   

# Add the plot title, x and y axis labels
plt.____('The Elbow Method')
plt.____('____')
plt.____('____')

# Plot SSE values for each k stored as keys in the dictionary
sns.____(x=list(sse.____()), y=list(sse.____()))
plt.show()
Edit and Run Code