Elbow method
In the previous exercise you've implemented MiniBatch K-means with 8 clusters, without actually checking what the right amount of clusters should be. For our first fraud detection approach, it is important to get the number of clusters right, especially when you want to use the outliers of those clusters as fraud predictions. To decide which amount of clusters you're going to use, let's apply the Elbow method and see what the optimal number of clusters should be based on this method.
X_scaled
is again available for you to use and MiniBatchKMeans
has been imported from sklearn
.
This exercise is part of the course
Fraud Detection in Python
Exercise instructions
- Define the range to be between 1 and 5 clusters.
- Run MiniBatch K-means on all the clusters in the range using list comprehension.
- Fit each model on the scaled data and obtain the scores from the scaled data.
- Plot the cluster numbers and their respective scores, it will take a few seconds to run.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Define the range of clusters to try
clustno = range(____, ____)
# Run MiniBatch Kmeans over the number of clusters
kmeans = [____(n_clusters=i, random_state=0) for ____ in ____]
# Obtain the score for each model
score = [kmeans[i].fit(____).score(____) for i in range(len(kmeans))]
# Plot the models and their respective score
plt.plot(____, ____)
plt.xlabel('Number of Clusters')
plt.ylabel('Score')
plt.title('Elbow Curve')
plt.show()