1. Hypothesis testing
A very important concept in experimental design is the formation and testing of a hypothesis, or your central research question. For the ToothGrowth dataset we worked with previously, the hypothesis concerned the effect of different doses and administration methods of Vitamin C on the length of tooth growth in the guinea pig.
Let's dig in a little more and look at how to create a research hypothesis.
2. Breaking down hypothesis testing:
There are really two hypotheses that are grouped together: the null and alternative hypotheses.
The null hypothesis is exactly what it sounds like, and the implications change depending on what you're testing. For example, in the tooth growth experiment, the null hypothesis is: "There is no effect of vitamin C dosage or administration type on guinea pig tooth growth."
There's some nuance involved in the alternative hypothesis, and its construction will help lead you to the correct test. If you're testing if the mean is only less than or greater than a value, it's a one-sided test. If you're testing that it's not equal to some number, that's a two sided test.
Recall when we conducted a two sided test to determine if the mean length of tooth growth was not equal to 18. The p-value was 0.4135, so at the 0.05 significance level, we fail to reject the null hypothesis. We have no strong evidence to suggest the mean is not equal to 18.
3. Power and sample size
Directly related to hypothesis testing is the idea of power. Power is the probability that the test correctly rejects the null hypothesis when the alternative hypothesis is true. One "golden rule" in statistics is to aim to have 80% power in your experiments, which you'll need an adequate sample size to achieve.
Effect size, in the context of power analysis, is a standardized measure of the difference you're trying to detect, calculated as the difference between group means divided by the pooled standard deviation of the data. It's easier to detect a larger difference in means than a smaller one.
Sample size is important in experiments. In general, as sample size increases, power increases; you've collected more information, so you're more capable of examining your hypotheses.
You need two of these three pieces to calculate the other: if you have a given power and effect size, you can generally calculate sample size. Let's review an example.
4. Power and sample size calculations
We're going to calculate power and sample size using the pwr package in this course.
Let's look at calculating power for an ANOVA, or Analysis of Variance test. This is good prep for when we execute these later in the course. The pwr-dot-anova.test() function takes 5 arguments, of which one must be entered as NULL so it can be calculated. k is the number of groups in the comparison, n is the number of observations per group, f is the effect size, then you have to enter a significance level and a power.
So to calculate power for a test with three groups, 20 people per group, with an effect size of 0.2 and a significance level of 0.05, you would enter this code. Calculating it returns a power of 0.25--not great! We probably can't detect that small of an effect size with so few people in each group.
5. Let's practice!
Let's go to the exercises and review one- and two-sided tests, plus examine the pwr package to calculate power and sample size.