1. Using the randomization distribution
Recall that the logic of statistical inference is to compare the observed statistic to the distribution of statistics that come from a null distribution. You've now seen how to create the distribution with your own R code. The next question to ask is, how do we use the information in the null distribution?
2. Understanding the null distribution
Remember that each dot that gets generated is from a different permutation of the data.
3. Understanding the null distribution
We use the null differences,
4. Understanding the null distribution
the dots, to define the setting that we are not interested in.
5. Understanding the null distribution
The goal is to show that our observed data are not consistent with the differences generated.
6. Understanding the null distribution
We want our observed data to be different from the null so that we can claim the alternative research hypothesis to be true.
7. Data consistent with null?
Using R, recall that 9% more people on the East Coast prefer cola than those on the West Coast.
8. Significance
On the dot plot, the null statistics, which are more extreme compared to the observed statistic, are colored red. We can see that about a third of the null statistics are as or more extreme than what we observed.
9. How extreme are the observed data?
To be more precise, R can count the number of times the null data were more extreme than the observed data. Here, we identify that 38% of the null statistics are more extreme than the difference which was observed. Thirty-eight percent, in conjunction with the dot plot, give evidence that the data are consistent with the permuted distribution. We have no evidence that rates of cola preference differ by coast.
10. Let's practice!
OK, now it's your turn to practice what you've learned.