ComeçarComece de graça

DBSCAN

In this exercise you're going to explore using a density based clustering method (DBSCAN) to detect fraud. The advantage of DBSCAN is that you do not need to define the number of clusters beforehand. Also, DBSCAN can handle weirdly shaped data (i.e. non-convex) much better than K-means can. This time, you are not going to take the outliers of the clusters and use that for fraud, but take the smallest clusters in the data and label those as fraud. You again have the scaled dataset, i.e. X_scaled available. Let's give it a try!

Este exercício faz parte do curso

Fraud Detection in Python

Ver curso

Instruções do exercício

  • Import DBSCAN.
  • Initialize a DBSCAN model setting the maximum distance between two samples to 0.9 and the minimum observations in the clusters to 10, and fit the model to the scaled data.
  • Obtain the predicted labels, these are the cluster numbers assigned to an observation.
  • Print the number of clusters and the rest of the performance metrics.

Exercício interativo prático

Experimente este exercício completando este código de exemplo.

# Import DBSCAN
from sklearn.cluster import ____

# Initialize and fit the DBSCAN model
db = DBSCAN(eps=____, min_samples=____, n_jobs=-1).fit(____)

# Obtain the predicted labels and calculate number of clusters
pred_labels = ____.____
n_clusters = len(set(pred_labels)) - (1 if -1 in labels else 0)

# Print performance metrics for DBSCAN
print('Estimated number of clusters: %d' % ____)
print("Homogeneity: %0.3f" % homogeneity_score(labels, pred_labels))
print("Silhouette Coefficient: %0.3f" % silhouette_score(X_scaled, pred_labels))
Editar e executar o código