Calculate sum of squared errors

In this exercise, you will calculate the sum of squared errors for different number of clusters ranging from 1 to 15. In this example we are using a custom created dataset to get a cleaner elbow read.

We have loaded the normalized version of data as data_normalized. The KMeans module from scikit-learn is already imported. Also, we have initialized an empty dictionary to store sum of squared errors as sse = {}.

Feel free to explore the data in the console.

This exercise is part of the course

Customer Segmentation in Python

View Course

Exercise instructions

Fit KMeans and calculate SSE for each k with a range between 1 and 15.
Initialize KMeans with k clusters and random state 1.
Fit KMeans on the normalized dataset.
Assign sum of squared distances to k element of sse dictionary.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Fit KMeans and calculate SSE for each k
for k in range(____, ____):
  
    # Initialize KMeans with k clusters
    kmeans = ____(n_clusters=____, random_state=1)
    
    # Fit KMeans on the normalized dataset
    kmeans.____(data_normalized)
    
    # Assign sum of squared distances to k element of dictionary
    sse[____] = kmeans.____

Edit and Run Code