Get startedGet started for free

Calculate sum of squared errors

In this exercise, you will calculate the sum of squared errors for different number of clusters ranging from 1 to 15. In this example we are using a custom created dataset to get a cleaner elbow read.

We have loaded the normalized version of data as data_normalized. The KMeans module from scikit-learn is already imported. Also, we have initialized an empty dictionary to store sum of squared errors as sse = {}.

Feel free to explore the data in the console.

This exercise is part of the course

Customer Segmentation in Python

View Course

Exercise instructions

  • Fit KMeans and calculate SSE for each k with a range between 1 and 15.
  • Initialize KMeans with k clusters and random state 1.
  • Fit KMeans on the normalized dataset.
  • Assign sum of squared distances to k element of sse dictionary.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Fit KMeans and calculate SSE for each k
for k in range(____, ____):
  
    # Initialize KMeans with k clusters
    kmeans = ____(n_clusters=____, random_state=1)
    
    # Fit KMeans on the normalized dataset
    kmeans.____(data_normalized)
    
    # Assign sum of squared distances to k element of dictionary
    sse[____] = kmeans.____
Edit and Run Code