DBSCAN

In this exercise you're going to explore using a density based clustering method (DBSCAN) to detect fraud. The advantage of DBSCAN is that you do not need to define the number of clusters beforehand. Also, DBSCAN can handle weirdly shaped data (i.e. non-convex) much better than K-means can. This time, you are not going to take the outliers of the clusters and use that for fraud, but take the smallest clusters in the data and label those as fraud. You again have the scaled dataset, i.e. X_scaled available. Let's give it a try!

Import DBSCAN.
Initialize a DBSCAN model setting the maximum distance between two samples to 0.9 and the minimum observations in the clusters to 10, and fit the model to the scaled data.
Obtain the predicted labels, these are the cluster numbers assigned to an observation.
Print the number of clusters and the rest of the performance metrics.

Introduction and preparing your data

Fraud detection using labeled data

Fraud detection using unlabeled data

Fraud detection using text

Exercicio

DBSCAN

Instruções