Get startedGet started for free

Elbow method

In the previous exercise you've implemented MiniBatch K-means with 8 clusters, without actually checking what the right amount of clusters should be. For our first fraud detection approach, it is important to get the number of clusters right, especially when you want to use the outliers of those clusters as fraud predictions. To decide which amount of clusters you're going to use, let's apply the Elbow method and see what the optimal number of clusters should be based on this method.

X_scaled is again available for you to use and MiniBatchKMeans has been imported from sklearn.

This exercise is part of the course

Fraud Detection in Python

View Course

Exercise instructions

  • Define the range to be between 1 and 5 clusters.
  • Run MiniBatch K-means on all the clusters in the range using list comprehension.
  • Fit each model on the scaled data and obtain the scores from the scaled data.
  • Plot the cluster numbers and their respective scores, it will take a few seconds to run.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Define the range of clusters to try
clustno = range(____, ____)

# Run MiniBatch Kmeans over the number of clusters
kmeans = [____(n_clusters=i, random_state=0) for ____ in ____]

# Obtain the score for each model
score = [kmeans[i].fit(____).score(____) for i in range(len(kmeans))]

# Plot the models and their respective score 
plt.plot(____, ____)
plt.xlabel('Number of Clusters')
plt.ylabel('Score')
plt.title('Elbow Curve')
plt.show()
Edit and Run Code