1. Power of a test
In this lesson, we'll learn all about statistical power, which is the probability of correctly detecting an effect, if indeed an effect is present. Let's dive in!
2. What determines power?
We'd like experiments to have: Large sample sizes, a treatment with a big impact, and a reasonable significance level, meaning neither too big nor too small.
Together, these three factors will influence the outcome of an experiment.
3. Simulating weight loss data
Imagine a weight loss study where the treatment group participates in exercise, and the control group does nothing. We then measure the weight loss for each group.
Suppose the control group has no weight loss on average, with a standard deviation of one pound. The treatment group has an average weight loss of two pounds with the same standard deviation. We can simulate normal data like this by calling norm-dot-rvs, from SciPy. We provide the control group mean by setting loc equal to zero, the standard deviation by setting scale equal to one, and the sample size by setting size equal to 100.
4. T-test
In this case, we know that the two groups have different mean weight loss, because we picked the values in our simulations! If the null hypothesis is that there is no difference in weight loss between the groups, we know that is wrong. The correct conclusion is that the treatment group had more weight less than the control group. If we use this data in a two-sample t-test, will our test come to that conclusion?
Here we conduct a t-test and see that the test did in fact reject the null in favor of the alternative. The test came to the correct conclusion.
5. Small sample size
What if instead of 100 people, we had just five people in each group? In this case, our test incorrectly fails to reject the null. That is because our sample size is too small.
6. Small effect size
Instead of two pounds, what if the weight loss was zero-point-two pounds? Again, our test comes to the incorrect conclusion of no difference!
7. Small effect? Big sample!
The problem is that our effect size is too small for our sample size and significance level. By increasing our sample size we can combat the small effect size.
8. Defining the power of a test
The underlying question is: if there is a significant effect, will our test be able to detect it?
This is power. Said more precisely, if the alternative is true, how likely is our test to reject the null in favor of the alternative?
When designing an experiment, it's best to calculate power before collecting a sample.
9. Calculating power
Statsmodels has different functions for calculating power in different tests. Since this is a t-test with independent samples we use the TTestIndPower function.
We'll use an effect size of zero-point-two pounds weight loss. We have 100 people in each group. And alpha is five percent.
The probability that our test detects this difference and rejects the null is only twenty nine-point-one percent. No wonder it came to the wrong conclusion!
Power is important when making inference, because it helps us determine just how likely we are to detect a significant effect, if one exists.
10. Solving for power
We can also solve for any one missing parameter using the solve_power function. Here we mark the sample size nobs1 as None to indicate we want to solve for it. We set power to eighty percent as a reasonably high power, but this is up to the user. The output tell us we need at least 13,735 people in each group to achieve a power of eighty percent.
11. Let's practice!
Now, let's practice!