Get startedGet started for free

Test of correlation

1. Test of correlation

Consider again the swing state county-level voting data.

2. 2008 US swing state election results

In the prequel to this course, we computed the Pearson correlation coefficient between Obama's vote share and the total number of votes. Remember the Pearson correlation coefficient is a measure of how much of the variability in two variables is due to them being correlated. It ranges from -1 for totally negatively correlated to 1 for positively correlated. We got a value of about point-54. This value of the Pearson correlation indicates that the data are not perfectly correlated, but are correlated nonetheless. But how can we know for sure if this correlation is real, or if it could have happened just by chance?

3. Hypothesis test of correlation

We can do a hypothesis test! We posit a null hypothesis that there is no correlation between the two variables, in this case Obama's vote share and total votes. We then simulate the election assuming the null hypothesis is true (which you will figure out how to do in the exercises), and use the Pearson correlation coefficient as the test statistic. The p-value is then the fraction of replicates that have a Pearson correlation coefficient at least as large as what was observed. I did this procedure, and in all 10,000 of my replicates under the null hypothesis,

4. More populous counties voted for Obama

not one had a Pearson correlation coefficient as high as the observed value of point-54. I tried generating 100,000, and then a million replicates. In all cases, not one replicate had a Pearson correlation coefficient as high as point-54. This does not mean that the p-value is zero. It means that it is so low that we would have to generate an enormous number of replicates to have even one that has a test statistic sufficiently extreme. We conclude that the p-value

5. More populous counties voted for Obama

is very very small and there is essentially no doubt that counties with higher vote count tended to vote for Obama. After all, that is how he won the election.

6. Let's practice!

Now it is your turn to think about how to do a hypothesis test on correlation and execute it!