Get startedGet started for free

Is that difference meaningful?

1. Testing differences between groups

So far, you've used data frames and graphs to compare attributes of different groups. You put together your insights and presented them, only to be asked, "is that a significant difference?" What does it mean to say a difference between two groups is significant?

2. Comparing two groups

Suppose you are comparing the heights of two samples of women. One was randomly sampled from the population of all women from Iowa. You want to know whether the second group was sampled from the same population as the first. The average height of the first group is 5'6", and the average height of the second group is 6'2". It's possible that a random sample of women from Iowa could be that tall, but it's unlikely.

3. Quantifying the likelihood

You can use a statistical test to determine how unlikely it is that the second group came from the same overall population, or how likely it is that the two samples are from different populations altogether. Behind the scenes, the test is using the first sample to estimate the average height for the population of all women in Iowa, and how much a woman's height is likely to vary from that average. Then it uses that estimate to determine how likely it is that the second sample came from the same population as the first sample. That likelihood is called the p-value. In this example, the p-value is 0.018. When the p-value is less than a pre-determined level, such as 0.05, you can be reasonably sure that the second group is not a sample of all women from Iowa, but rather might be a sample of women from a different population. When we say a difference between two groups is "significant", we mean that we are confident that the two groups are not samples from the same population.

4. The t-test

There are many statistical tests available, and the correct one to use depends on the data you're testing. You'll use two tests in this course: the t-test and the chi-squared test. Both can be used to compare differences between two groups, but the two tests are not interchangeable. You use the t-test when the attribute you've chosen to compare is continuous, such as salary or years of tenure. To test whether managers and non-managers have significantly different tenure, put the tenure and is_manager variables on either side of the tilde, and identify the data frame the variables come from. The output includes some useful information, but we'll focus on the p-value. Here it's 0.22, which is not less than 0.05. The test result is not significant, so we cannot say that tenure is significantly different for managers and non-managers.

5. The chi-squared test

You use the chi-squared test when you're comparing a categorical attribute, such as being a high performer. In practice, this can be used to compare group composition, such as whether one group has a higher proportion of high performers, or a higher rate of employee turnover. Notice the different syntax here. To test whether managers and non-managers have significantly different turnover rates, pass the left_company and is_manager variables to the function. There is no data argument for the chi-squared test, so you'll need to specify the data frame each time. Here, the p-value is much lower than 0.05, so the test is significant.

6. Where are the formulas?

That's all you need to use t-tests and chi-squared tests to compare groups. We've skipped the details and statistical assumptions underlying these tests, but you can get started by knowing which test to choose, and letting R do the heavy lifting.

7. Let's practice!

To learn more about these tests, check out the statistics courses here on DataCamp.