Case study: election fraud

1. Case study: election fraud

Welcome to the final chapter of the course on inference for categorical data. We have one more statistical test to add to our toolbox - the goodness of fit test - but let's introduce it in the context of a case study on election fraud.

2. Election fraud

3. Election fraud

An official who was perpetrating election fraud would come in and alter these numbers before reporting them to the authorities, usually in a manner that favored one of the candidates, in this case candidate B. If the original ballot data is not retained, this sort of fraud may be difficult to detect and prove. This has led some to look to a statistical method based on something called Benford's Law.

4. Benford’s Law A.K.A. "the first digit law"

Benford's Law appears when looking at broad collections of numbers, and paying attention to only to the first digit of each number. For example, consider the populations of every country on earth. This demographic data lives inside the gapminder package. We can restrict out attention to data from two thousand and seven and select off the country name and population variables. Now Benford's Law is only concerned with the first digit, so here we see that that Afghanistan, Albania, and Algeria all start with three, Angola starts with one, and so on. If we do this for all one hundred and forty two countries in this dataset and visualize the distribution in a bar chart, what do you think it would look like? Would it be uniform? It turns out it's not, not by a long shot. The most common number is one, the second most common is two, and so on with the higher numbers being less common. This decaying distribution of first digits is what is captured precisely by Benford's Law. It proposes, for example, that exactly thirty-point-one percent of the first digits should be one, seventeen-point-six percent should be two, and so on.

5. Benford’s Law A.K.A. "the first digit law"

To think about why this pattern emerges, imagine if these populations were drawn randomly from the integers between 1 and 150. What proportion of these numbers lead with a 1? Well, there's 1, there's 10 through 19, then there's the whole swath of 100 to 150. That's more than 40% of these numbers. To see why two is the second most common, imagine if instead these numbers were between 1 and 250. That would be a lot of leading twos. The distribution of first digits becomes distributed according to Benford's Law when the numbers span many orders of magnitude. The idea behind using Benford's Law to detect election fraud is that in a free and fair election, the distribution of first digits should roughly follow Benford's Law. If instead election officials are fiddling with the totals manually, they will tend to use leading digits drawn more uniformly between 1 and 9; more sevens, for example. This approach to detecting election fraud was prominently used in the 2009 presidential election in Iran.

6. Iran election 2009

In this election, the incumbent, Mahmoud Ahmadinejad, faced several challengers, the most prominent of whom was Mir-Hossein Mousavi. There were widespread claims of election fraud from both the international community and some parties within Iran. One of the key points of evidence that was used in arguing fraud were the vote counts, which are available in a dataset called Iran.

7. Let's practice!

In the following exercises, you'll have the chance to get your hands into this real election data and use Benford's law to evaluate the claim of voter fraud.

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.