Get startedGet started for free

Distribution of statistics

1. Distribution of statistics

Previously, we used the difference in proportions

2. Null statistic

to distinguish between the null statistic and the observed statistic. But what if we used the ratio of proportions instead of the difference? We can still perform inference in this case. We are still most interested in whether or not the observed statistic is different from the values obtained by the shuffling. There isn't anything magical about differences in proportions except that they help you differentiate between null values and data that support the alternative hypothesis. We will stick with differences for this course as a way to simplify the ideas you are learning. In other courses, you can investigate other statistics like ratios of proportions.

3. Calculating quantiles

One way to measure how far the observed statistic is from the null values is to calculate quantiles of the null statistics.

4. Calculating quantiles

After we've generated 100

5. Calculating quantiles

different shuffles of the original data,

6. Calculating quantiles

we see that the 5% quantile is negative (point) 292.

7. Calculating quantiles

That is, 5% of the observations are at negative (point) 292 or below.

8. Calculating quantiles

The 95% quantile is (point) 208.

9. Calculating quantiles

That is, 95% of the null observations are at (point) 208 or below, meaning that our observed statistic of (point) 29 is larger than 95% of the null statistics.

10. Quantile measurement

Using R, we can get the same quantile measurement that allow comparisons of the null statistics and the observed statistic. Given the previous simulations, we can see that 95% of the null differences are positive (point) 208 or lower. These simulations give further evidence that the observed statistic is not consistent with the bulk of the null differences.

11. Critical region

Often the quantiles describing the tails of the null distribution determine what is called the critical region.

12. Critical region

The critical region determines which observed statistics are consistent with the null distribution.

13. Let's practice!

OK, now it's your turn to practice what you've learned.