Get startedGet started for free

Student's t-test

1. Our first hypothesis test - Student's t-test

It is possible to find all kinds of patterns in data.

2. From observed pattern to reliable result

Some are expected, while others are more surprising. However, most datasets also include random variation. Knowing this, how can we go from a simple observation to a reliable result?

3. Are these groups different?

Let's say we have the body weights of two samples from two groups of people, A and B. When we plot it, we seem to see a trend, where the group mean for sample B is larger than that for sample A. Is this difference real, or simply random variation?

4. Two hypotheses

To draw a conclusion, we will need to distinguish between two cases or hypotheses. In statistics, our starting point is the "null hypothesis": that there isn't anything interesting happening and the observed patterns are just the product of random chance. With enough evidence, we can reject the null hypothesis and turn to the more interesting "alternative hypothesis": that the difference between these samples represents a real difference between the populations.

5. Some statistical terms

But when do we know to reject the null hypothesis? Here we turn to two statistics. The p-value represents the likelihood that the distribution of values observed would occur if the null hypothesis were correct. We can't be 100 percent sure that our pattern couldn't have emerged due to random chance but we can quantify the probability that random chance would produce a given pattern; this is the p-value. The smaller the p-value is, the less likely it is that the null hypothesis can account for our observations. When p falls below a critical value, which we call alpha, we reject the null hypothesis. A standard value for alpha is 0 point 05. So, below a 5 percent probability that random chance would produce the pattern observed, it's usually considered safe to reject the null hypothesis.

6. Student's t-test

To compare two sets of values for a continuous variable, we will use Student's t-test. This test was invented by William Sealy Gosset as a means of monitoring beer quality for Guinness. A noble endeavor, to be sure! There are two basic types. A one-sample t-test will test the likelihood that the mean of a population is different from a given value. A two-sample t-test will test the likelihood that there is a difference between the means of two populations. For a two-sample t-test, we'll use the function ttest-underscore-ind and we'll give it two arrays. This yields an array with the p-value at index 1.

7. Implementing a one-sample t-test

For the one-sample t-test, we first import stats. Then we use the ttest-underscore-1sample function, which takes two arguments. The first is our sample array, taken from our DataFrame, and the second is a number we want to compare. Finally, using a standard alpha value of 0 point 05, we test whether our p-value falls under alpha.

8. Implementing a two-sample t-test

Now, with the two-sample t-test, we can answer whether the members of group B are heavier than the members of group A? We take two arrays from our DataFrame, use them as the ttest function arguments, save the result as t-underscore-result, and then test whether our p-value falls under alpha. It does! It seems that this difference may have some validity.

9. Now let's try it out!

Now it's time for you to try it out.