Paired t-tests

1. Paired t-tests

Previously, you used the t-distribution to compute a p-value from a standardized test statistic related to the difference in two means across two groups.

2. US Republican presidents dataset

Here's a dataset of US presidential elections. Each row represents a presidential election at the county level. The variables in the dataset are the US state, the county within that state, and the percentage of votes that went to the Republican candidate in 2008, and in 2012.

3. Hypotheses

One question is whether the percentage of votes given to the Republican candidate was lower in 2008 compared to 2012. To test this, we form hypotheses. As before, the null hypothesis is that our hunch is wrong, and that the population parameters are the same in each year group. The alternative hypothesis is that the parameter in 2008 was lower than in 2012. I'm setting a significance level of point-zero-five. One feature of this dataset is that the 2008 votes and the 2012 votes are paired, since they both refer to the same county. That is, the 2008 and 2012 values aren't independent from each other. Some voting patterns may occur due to county-level demographics and local politics. We want to capture this pairing in our model.

4. From two samples to one

For paired analyses, rather than considering the two variables separately, we consider a single variable of the difference. In this histogram of the difference most values are between minus ten and ten, with a few outliers.

5. Calculate sample statistics of the difference

The sample mean, x-bar, is calculated on this difference. It is minus two-point-six-four.

6. Revised hypotheses

We can restate the hypotheses in terms of the single population mean, mu-diff, being equal to or less than zero. The test statistic, t, has a slightly simpler equation compared to the two sample case. We have one statistic, so the number of degrees of freedom is the number of rows in the sample minus one.

7. Calculating the p-value

To calculate the test statistic, we need the number of rows in the dataset, five hundred. And we need the standard deviation of the differences. We already know x-bar, the mean of the differences. Assuming the null hypothesis is true means mu-diff is zero. We now have everything we need to plug into the equation to calculate t. It's minus sixteen. The degrees of freedom are one less than n-diff at four hundred and ninety nine. Finally, we transform t with the t-distribution CDF. The p-value is really small at ten to the minus forty seven. That means we reject the null hypothesis in favor of the alternative hypothesis that the Republican candidate got a smaller percentage of the vote in 2008 compared to 2012.

8. Testing differences between two means using t.test()

That was a lot of calculating. Fortunately, there's an easier way using t-dot-test. It works with vectors, so the first argument is the vector of differences. The type of alternative hypothesis can be two-sided, less or greater. Finally, you specify the value of mu-diff from the null hypothesis. Zero is the default, so strictly-speaking we didn't need to specify it. Here's the output. You should recognize the value of the test statistic and the degrees of freedom, as well as x-bar on the last line. The p-value is written as "less than two-point-two times ten to the minus sixteen". p-values smaller than this are less reliable due to computational accuracy constraints, but it's the same number we calculated before.

9. t.test() with paired = TRUE

There's a variation of t-dot-test for paired data that requires even less work. Rather than calculating the difference between the two paired variables, you can just pass them both directly to t-dot-test and set paired to TRUE. Notice that all the numbers are the same.

10. Unpaired t.test()

If we don't set paired to TRUE and instead performed an unpaired t-test, then the numbers change. The test statistic is closer to zero, there are more degrees of freedom, and the p-value is much larger. Performing an unpaired t-test increases the chance of a false negative error.

11. Let's practice!

Time to perform some pairing.

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.