1. Recap
For the final video, let's do a short recap on the things you've learned throughout this course.
2. Working with imbalanced data
During this course, you've learned how to work with highly imbalanced fraud data, which is a common problem for fraud detection. You've learned about resampling methods such as Random Over and Under Sampling, and the Synthetic Minority Oversample Technique, or SMOTE, to adjust the balance in your data to better train your models.
3. Fraud detection with labeled data
In Chapter 2, you've refreshed supervised learning techniques such as random forests. You then went into depth regarding reliable performance metrics, such as the confusion matrix, and explored the importance of the precision-recall curve. Additionally, you've learned how to adjust the parameters of your machine learning models to improve your fraud detection, and lastly, you've learned how to apply ensemble methods to combine multiple machine learning models to improve performance.
You are also aware that in reality, truly reliable labels are often not available, so applying supervised learning is not always possible.
4. Fraud detection without labels
In Chapter 3, you've learned about the importance of segmentation to set a baseline for what is normal and not normal behavior in your fraud data. You've worked on K-means clustering and the elbow method to apply clustering to fraud. You've further learned how to tag suspicious behavior by using outliers of the clusters, or the smallest clusters. Lastly, you've explored the DBSCAN clustering method for fraud, which has the benefit of not assigning cluster numbers beforehand.
You also know about the drawback of clustering methods for fraud detection. In reality, there are no easy ways to check validity of clustering models. It requires strong collaboration with the fraud analysts to sense check your results. The benefits of clustering for fraud detection is that your model adapts to the current data and is therefore dynamic. It doesn't rely on historically known fraud cases to learn from.
5. Text mining for fraud detection
In the last chapter, you've learned how to leverage text data into your fraud analysis, by creating flags from word searches and topic models. You've learned in more detail how to properly clean text data, such that you can analyze it. In many cases of fraud detection you will have some sort of text data available, such as transaction descriptions, client emails, incident reports, etc. That makes text mining an important skill to have for fraud detection.
6. Further learning for fraud detection
Other topics which we haven't covered in this course are the following.
Very often fraudsters collaborate with a network of individuals, in order to cover up their crime. Network analysis is an important tool not only to flag individuals committing fraud, but also to lay bare entire networks evolving around the same fraud case. I encourage you to study network analysis for fraud detection further beyond this course.
During this course, we were limited to applying a handful of supervised and unsupervised learning techniques. You might want to explore neural networks for fraud detection. Neural nets typically outperform other algorithms when data gets very large and complex, which can be the typical fraud data.
And that brings us to the last tip. Fraud data can be incredibly large, especially if you're working with money transactions, website traffic, or large amounts of text data. It is wise, in such cases, to explore distributed computing systems such as Spark, to boost your analysis.
7. End of this course
And that brings us to the end of this course. I hope you learned a lot and thank you for watching.