1. Benford's Law for fraud detection
In this lesson we'll explain how Benford's law has become an important tool in detecting fraud.
2. Many datasets satisfy Benford's Law
Data sets satisfying one of the following conditions typically conform to Benford's Law.
Typically, the more orders of magnitude that the data covers (at least 4 digits) and the more observations we have (typically 1000 or more), the more likely the data set will satisfy Benford's Law.
3. For example
Here are some examples.
4. Benford's Law for fraud detection
We have already seen that many real-life datasets conform to Benford's Law. Most financial data and accounting numbers conform to Benford's Law.
Fraudsters typically change the dataset by adding fake numbers (which do not follow Benford's Law). Due to these abnormal duplications and atypical numbers, the dataset is not conform to Benford's Law anymore. Hence Benford's Law is very convenient for fraud detection since it identifies deviations that need further review. It is even legally admissible as evidence in the US in criminal cases at the federal, state and local levels.
Benford’s Law was for example used as evidence of voter fraud in the 2009 Iranian election.
Mark Nigrini showed that Benford's Law could be used in forensic accounting and auditing as an indicator of accounting and expenses fraud.
5. Be careful
Not every dataset has to conform to Benford's Law and many will never do: for example if there is a maximum or minimum for the data, or if the data are in a narrow interval. It also does not work if the numbers are identification numbers like social security numbers or when there are additive instead of multiplicative fluctuations.
6. Benford's Law for the first-two digits
A dataset satisfies Benford's Law if the first two digits d_1d_2 appear with following probability.
[read equation]
We have already implemented this function in R in the former lesson and as input parameters we can thus give all numbers between 1 and 99.
This test is often preferred for fraud detection since it captures more information than the first and second digits tests combined.
7. Census data
Let us look again at the census data, containing the populations of 19509 towns and cities of the United States. It is clear that this data also conforms to Benford's Law for the first-two digits.
The input parameter number.of.digits indicates how many first digits need to be analyzed!
8. Employee reimbursements
An internal audit department needs to check the employee reimbursements for fraud. Employee Sebastiaan has reimbursed 1000 business meals and travel expenses in the last 5 years after scanning images of his receipts.
Let us check these reimbursements for fraud using Benford's Law.
9. Analysis with Benford's Law for first digit
The dataset clearly has less 1's and more 7's than expected under Benford's Law.
10. Analysis with Benford's Law for first-two digits
When comparing with the expected frequencies for the first-two digits, we again notice more numbers starting with 7 than expected.
The internal audit department also investigated the reimbursements of other employees and most of them conform to Benford's Law.
Therefore they decide to investigate the most deviating ones, like this one from Sebastiaan. After analyzing his reimbursement starting with 7, it was detected that Sebastiaan replaced 1/3 of his expenses starting with a 1 by a 7 before scanning the receipt to reimburse more money.
11. Let's practice!
Now let's try some examples.