Get startedGet started for free

Assumptions in hypothesis testing

1. Assumptions in hypothesis testing

Each hypothesis test makes assumptions about the data. It's only when these assumptions are met that it is appropriate to use that hypothesis test.

2. Randomness

Whether it uses one or two samples, every hypothesis test assumes that each sample is randomly sourced from its population. If you don't have a random sample, then it won't be representative of the population. To check this assumption, you need to know where your data came from. There are no statistical or coding tests you can perform to check this. If in doubt, ask the people involved in collecting the data, or a domain expert that understands the population being sampled.

3. Independence of observations

Tests also assume that each observation is independent. There are some special cases like paired t-tests where dependencies between two samples are allowed, but these change the calculations so you need to understand where such dependencies occur. As you saw with the paired t-test, not accounting for dependencies results in an increased chance of false negative and false positive errors. This is also a difficult problem to diagnose after you have the data. It needs to be discussed before data collection.

4. Large sample size

Hypothesis tests also assume that your sample is big enough. Smaller samples incur greater uncertainty, and mean that the Central Limit Theorem doesn't apply, which in turn means that the sampling distribution might not be normally distributed. The increased uncertainty means you get wider confidence intervals on the parameter you are trying to estimate. The Central Limit Theorem not applying means the calculations on the sample could be nonsense, which increases the chance of false negative and positive errors. The check for "big enough" depends on the test and that's where we'll head next.

5. Large sample size: t-test

For one sample t-tests, a popular heuristic is that you need at least thirty observations in your sample. For the two sample case or ANOVA, you need thirty observations in each. That means you can't compensate for one small group sample by making the other one bigger. In the paired case, you need thirty pairs of observations.

6. Large sample size: proportion tests

For one sample proportion tests, the sample is considered big enough if it contains at least ten successes and ten failures. Notice that if the probability of success is close to zero or close to one, then you need a bigger sample. In the two sample case the size requirements apply to each sample separately.

7. Large sample size: chi-square tests

The chi-square test is slightly more forgiving and only requires five successes and failures in each group rather than ten.

8. Sanity check

One more check you can perform is to calculate a bootstrap distribution and visualize it with a histogram. If you don't see a bell-shaped normal curve, then one of the assumptions hasn't been met. In that case, you should revisit the data collection process, and see if any of the three assumptions of randomness, independence, and sample size do not hold.

9. Let's practice!

Let's check some assumptions!

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.