Get startedGet started for free

Comparing two means

1. Comparing two means

Great work on the one-sample t-test! In this lesson, we will review the two-sample t-test.

2. Comparing two means

Comparing the means of two populations based on samples is a useful skill for these professionals who are researching the effect of a given factor. Imagine that you need to assess which of two routes is on average quicker to help your company's drivers. An interviewer might want to check if you can support the management with a data-based solution.

3. Hypotheses

In the one-sample t-test, we were testing the population's mean against a given value. In the two-sample t-test, the null hypothesis states that the means of two different populations are equal.

4. Assumptions

The two-sample t-test has the same assumptions as the one-sample t-test. The underlying data must be normally distributed, the samples need to be random, and the observations need to be independent. Additionally, the two-sample t-test requires the equality of variances.

5. Assumptions

You can test if samples are from populations with equal variances using, for example, Bartlett's test.

6. Teaching methods

Let's go through a quick example of the application of the two-sample t-test. Imagine that each box represents a student.

7. Teaching methods

Students were taught with one of the two methods. The red population was taught with the first method, and the blue population was taught with the second method.

8. Teaching methods

The principal of the school draws samples from each of these two populations and examines students.

9. Teaching methods

He records the exam scores

10. Teaching methods

and calculates the mean of the results of each sample. As you can see, the red sample has a higher mean score than the blue sample.

11. Teaching methods

The principal assumes that the red teaching method is better, which seems to be a quite natural conclusion. But in fact, there might be no differences between groups. Let's see why.

12. Teaching methods

These are the possible distributions of the populations. We see that the peak of the blue distribution is higher than the peak of the red distribution. In the previous slide, we've seen that the mean result for a red sample was higher.

13. Teaching methods

The dots represent samples of students.

14. Teaching methods

The results of the blue sample

15. Teaching methods

are, on average, lower than the result of the red sample even if the mean for the whole population is higher. We see that we can't make sensible conclusions based on the mean of samples alone.

16. Paired t-test

Sometimes we don't analyze two distinct datasets but the same set of observations measured twice. In that case, you need to use a paired t-test.

17. Paired t-test

In a two-sample t-test, we draw samples from two populations

18. Paired t-test

and compare the means.

19. Paired t-test

In a paired t-test, we draw a sample from one population and measure it twice.

20. Paired t-test

We compare the true means before and after. Suppose that you are interested in measuring the effectiveness of your company's training program. You can measure employees' knowledge before and after training and test the difference of means using a paired t-test.

21. t-test in R

To perform a two-sample t-test in R, you can use a t.test function. You need to specify a formula with a numerical variable and a factor variable that identifies two groups. You should also set the var.equal parameter to TRUE.

22. t-test in R

For a paired t-test, you need to set the paired parameter to TRUE.

23. Summary

In this lesson, we covered the two-sample t-test, including its hypotheses and assumptions. We also talked about the paired t-test and the t.test function in R.

24. Let's practice!

Now it's your turn to compare two means in R!