In this chapter, you'll learn about the typical challenges associated with fraud detection, and will learn how to resample your data in a smart way, to tackle problems with imbalanced data.

Introduction to fraud detection

Checking the fraud to non-fraud ratio

Plotting your data

Increasing successful detections using data resampling

Resampling methods for imbalanced data

Applying SMOTE

Compare SMOTE to original data

Fraud detection algorithms in action

Exploring the traditional way to catch fraud

Using ML classification to catch fraud

Logistic regression combined with SMOTE

Using a pipeline

Introduction and preparing your data

Now that you're familiar with the main challenges of fraud detection, you're about to learn how to flag fraudulent transactions with supervised learning. You will use classifiers, adjust them, and compare them to find the most efficient fraud detection model.

Review of classification methods

Natural hit rate

Random Forest Classifier - part 1

Random Forest Classifier - part 2

Performance evaluation

Performance metrics for the RF model

Plotting the Precision Recall Curve

Adjusting your algorithm weights

Model adjustments

Adjusting your Random Forest to fraud detection

GridSearchCV to find optimal parameters

Model results using GridSearchCV

Ensemble methods

Logistic Regression

Voting Classifier

Adjust weights within the Voting Classifier

Fraud detection using labeled data

This chapter focuses on using unsupervised learning techniques to detect fraud. You will segment customers, use K-means clustering and other clustering algorithms to find suspicious occurrences in your data.

Normal versus abnormal behavior

Exploring your data

Customer segmentation

Using statistics to define normal behavior

Clustering methods to detect fraud

Scaling the data

K-means clustering

Elbow method

Assigning fraud versus non-fraud

Detecting outliers

Checking model results

Other clustering fraud detection methods

DBSCAN

Assessing smallest clusters

Checking results

Fraud detection using unlabeled data

In this final chapter, you will use text data, text mining, and topic modeling to detect fraudulent behavior.

Using text data

Word search with dataframes

Using list of terms

Creating a flag

Text mining to detect fraud

Removing stopwords

Cleaning text data

Topic modeling on fraud

Create dictionary and corpus

LDA model

Flagging fraud based on topics

Interpreting the topic model

Finding fraudsters based on topic

Recap

Fraud detection using text

Chapter 1 datasets

Chapter 2 datasets

Chapter 3 datasets

Chapter 4 datasets

A typical organization loses an estimated 5% of its yearly revenue to fraud. In this course, you will learn how to fight fraud by using data. For example, you'll learn how to apply supervised learning algorithms to detect fraudulent behavior similar to past ones, as well as unsupervised learning methods to discover new types of fraud activities. Moreover, in fraud analytics you often deal with highly imbalanced datasets when classifying fraud versus non-fraud, and during this course you will pick up some techniques on how to deal with that. The course provides a mix of technical and theoretical insights and shows you hands-on how to practically implement fraud detection models. In addition, you will get tips and advice from real-life experience to help you prevent making common mistakes in fraud analytics.

Unsupervised Learning in Python

Supervised Learning with scikit-learn

Learn to detect fraud with Python by resampling imbalanced data, supervised learning techniques, segmentation, K-means, data mining, and topic modeling.

Fraud Detection in Python

Likely to Recommend

Creating a flag

“Fraud Detection in Python”

Exercise instructions

Hands-on interactive exercise

Fraud Detection in Python

Chapter 1: Introduction and preparing your data

Chapter 2: Fraud detection using labeled data

Chapter 3: Fraud detection using unlabeled data

Chapter 4: Fraud detection using text

What is DataCamp?