1. Alternate method: the chi-squared distribution
Now you have some experience conducting a hypothesis test of independence using the chi-squared statistic and permutation.
2. Approximation distributions: normal
Next, you'll learn how to conduct the same test using an approximation method. This is the most commonly-used way to formulate the null distribution, so let's dive in.
The approximation distribution that you're most familiar with is the Normal distribution. This can be used to approximate the null distribution when the statistic is a proportion or a difference in proportions and the sample size is large.
3. Approximation distributions: chi-squared
In the case of the chi-squared statistic, the distribution is called, conveniently enough, the chi-squared distribution. You'll sometimes see it written out in words and other times given the Greek letter that looks like and X. The shape of this distribution is determined by one parameter called the degrees of freedom. Here are three different chi-squared distributions, with one, three, and five degrees of freedom. You'll note that they're all positive and they're all right skewed, but as the degrees of freedom increases, the mean increases. You can find the appropriate degrees of freedom for your test by taking the number of rows minus one and multiplying it by the number of columns minus one.
4. H-test via approximation
In the last exercises, you conducted permutation test to assess if natspac was independent of party. The observed chi-squared statistic was one-point-three-three, which you found to be not too far into the tails of your null distribution.
5. H-test via approximation
Instead of using this null distribution, we could instead rely upon the chi-squared distribution. The number of rows in this table was three and the number of columns was three, so the appropriate chi-squared distribution to use is the one with four degrees of freedom. If we plot that on top of our null distribution, we see that the two very close.
To find a p-value according to this distribution, ask for the proportion of the chisq distribution, pchisq, that is to the left of our observed statistic. We're interested in the right-tail, however, so the p-value is calculated as 1 minus this value. Not surprisingly, this is very close to the p-value from the permutation approach.
6. The chi-squared distribution
Like the normal distribution, the chi-squared distribution only becomes a good approximation when the sample size is large. A good rule of thumb is that the expected counts in each cell should be five or greater. Another recommendation is to only use this distribution when the degrees of freedom is two or greater. If you have one degree of freedom, you're looking at a two by two table, which means you can just compare proportions using the normal distribution.
7. Let's practice!
OK, now it's your turn to practice.