1. Election fraud in Iran and Iowa: debrief
OK, at this point, you're probably thinking: wait, what?
2. Iowa election fraud
Were officials in Iowa really altering vote totals?!
3. Iowa election fraud
Political scientists and election observers assure us that the answer to this question is: no. The 2016 presidential election was largely free of this sort of election fraud.
4. Iowa election fraud
So what happened? Why did your hypothesis test give you the wrong answer?
5. What went wrong?
One possibility that you should always consider when you reject the null hypothesis is the possibility that you have made a type I error, that is, rejecting the null hypothesis when it is true.
6. What went wrong?
If your test is performing properly, you know exactly the probability of this event, in fact you get to set it: it's your threshold for rejecting the null hypothesis, which we commonly set to 5%.
7. What went wrong?
So you may have made a type I error but a more fundamental issue might be at work here. To be very precise, you tested the hypothesis that the first digit distribution from Benford's Law is a good fit to the first digit distribution of your data.
8. What went wrong?
What if Benford's Law doesn't actually apply to this sort of data even when it's a free and fair election? To get that nicely shaped decaying distribution of first digits, the counts need to span many orders of magnitude and be uniformly distributed in their logarithms.
9. What went wrong?
Certain phenomena like populations of world cities fit these criteria and are a good fit for Benford's Law. But it's not clear that vote totals are a good fit for this distribution.
10. Take-home lesson
This is an important lesson: you need to be certain of the appropriateness of a particular analytical method to the context of the data that you're studying before you put much weight in the result. When in doubt, one solution is to do as you've done here: do a sanity check with an analysis where you know the correct answer. In this case, your Iowa analysis should make you skeptical of the Iran analysis. Of course, there may be other, more convincing sources of evidence that the 2009 Iran Election was fraudulent, but this particular analysis, based on Benford's Law, can't really tell us much either way.
11. Methods for categorical data
Alright, that's it, let's wrap up this course. In case you've lost track, you've learned several techniques to carry out inference on categorical data. You started with confidence intervals on one proportion and the difference in two proportions, then shifted to the other side of the coin with hypothesis tests on proportions. From there, you moved into the case where you need to assess the independence of two categorical variables that had many different levels. That led you to the chi-squared test of independence. In this chapter, you looked at another use for the chi-squared statistic: in testing the goodness of fit of a dataset to a particular categorical distribution.
12. Hypothesis test
That might seem like a long list to remember, but really all you need to internalize are these steps every test follows:
13. Hypothesis test
You specify the variable or variables of interest,
14. Hypothesis test
you propose your null hypothesis,
15. Hypothesis test
you generate datasets that would appear in a world where that hypothesis is true,
16. Hypothesis test
then you calculate an appropriate test statistic for each one of those datasets.
17. Hypothesis test
The final step is to simply assess the distribution of these null statistics and see if the statistic that you actually observed is consistent or inconsistent with it.
18. Let's practice!
Congratulations on making it to the end of the course!