LoslegenKostenlos loslegen

Elbow method

In the previous exercise you've implemented MiniBatch K-means with 8 clusters, without actually checking what the right amount of clusters should be. For our first fraud detection approach, it is important to get the number of clusters right, especially when you want to use the outliers of those clusters as fraud predictions. To decide which amount of clusters you're going to use, let's apply the Elbow method and see what the optimal number of clusters should be based on this method.

X_scaled is again available for you to use and MiniBatchKMeans has been imported from sklearn.

Diese Übung ist Teil des Kurses

Fraud Detection in Python

Kurs anzeigen

Anleitung zur Übung

  • Define the range to be between 1 and 5 clusters.
  • Run MiniBatch K-means on all the clusters in the range using list comprehension.
  • Fit each model on the scaled data and obtain the scores from the scaled data.
  • Plot the cluster numbers and their respective scores, it will take a few seconds to run.

Interaktive Übung

Versuche dich an dieser Übung, indem du diesen Beispielcode vervollständigst.

# Define the range of clusters to try
clustno = range(____, ____)

# Run MiniBatch Kmeans over the number of clusters
kmeans = [____(n_clusters=i, random_state=0) for ____ in ____]

# Obtain the score for each model
score = [kmeans[i].fit(____).score(____) for i in range(len(kmeans))]

# Plot the models and their respective score 
plt.plot(____, ____)
plt.xlabel('Number of Clusters')
plt.ylabel('Score')
plt.title('Elbow Curve')
plt.show()
Code bearbeiten und ausführen