Calculate sum of squared errors
In this exercise, you will calculate the sum of squared errors for different number of clusters ranging from 1 to 15. In this example we are using a custom created dataset to get a cleaner elbow read.
We have loaded the normalized version of data as data_normalized
. The KMeans
module from scikit-learn
is already imported. Also, we have initialized an empty dictionary to store sum of squared errors as sse = {}
.
Feel free to explore the data in the console.
This exercise is part of the course
Customer Segmentation in Python
Exercise instructions
- Fit KMeans and calculate SSE for each
k
with a range between 1 and 15. - Initialize KMeans with
k
clusters and random state 1. - Fit KMeans on the normalized dataset.
- Assign sum of squared distances to
k
element ofsse
dictionary.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Fit KMeans and calculate SSE for each k
for k in range(____, ____):
# Initialize KMeans with k clusters
kmeans = ____(n_clusters=____, random_state=1)
# Fit KMeans on the normalized dataset
kmeans.____(data_normalized)
# Assign sum of squared distances to k element of dictionary
sse[____] = kmeans.____