Get startedGet started for free

t-interval for paired data

1. t-interval for paired data

In this video we discuss how we estimate the mean difference between data coming from two dependent groups, in other words, paired data. First, the good news: there's not much new here. We'll quickly see that we can summarize paired data in a way that allows us to re-use techniques we've already learned.

2. High School and Beyond

As usual, let's frame our discussion around a real-world problem. 200 observations were randomly sampled from the High School and Beyond survey. The same students took a reading and writing tests. At a first glance, how are the distributions of reading and writing scores similar and how are they different? It appears that the median writing score is slightly higher than the median reading score. Both distributions seem fairly symmetric. But the reading scores are slightly more right skewed, as evidenced by the median that is closer to the 25th percentile than the 75th percentile. Also, the reading scores are slightly more variable than the writing scores. That all being said, at a first glance it's difficult to tell if there's a difference in the reading and writing scores.

3. Independent scores?

Can reading and writing scores for a given student student assumed to be independent of each other? A student's reading score is likely not independent of their writing score. If they are generally a high achieving student, they're likely to score highly on both tests.

4. Analyzing paired data

When two sets of observations have the special correspondence, or in other words, they're not independent, they're said to be paired. To analyze paired data, it is often useful to look at the difference in outcomes of each pair of observations. So here for example, for each student, we subtract their writing score from their reading score, and create a new variable, called diff, for the difference between the two scores for each student.

5. Estimating the mean difference in paired data

Our goal is to construct a 95% confidence interval for the mean difference between the average reading and writing scores.

6. Estimating the mean difference in paired data

And as we said at the beginning of the video, implementation-wise there is not much new here. Our data is in a singe column, diff, so we simply build an interval for the mean of this single variable using the t-dot-test function.

7. Estimating the mean difference in paired data

The 95% confidence interval is found to be (-1.78, 0.69).

8. Interpreting the CI for mean difference in paired data

So how we interpret this 95% conifidence inteval for the mean difference between the reading and writing scores? A standard interpretation would say something along the lines of "95% confidence interval for the mean difference in reading and writing scores (read minus write) is -1.78 to 0.69. But a good interpretation should convey directionality, in other words tell us which group is bigger than the other. In this case we are 95% confident that the average reading score is 1.78 points lower to 0.69 points higher than the average writing score.

9. Let's practice!

Time to put this into practice.