1. Calculating sample size
Congratulations, you are almost finished your first chapter on A/B testing.
2. Calculating the sample size of our test
Let’s finish covering the knowledge we need to calculate our tests needed sample size.
3. Null hypothesis
First, let’s discuss the Null hypothesis. This is the hypothesis that our control and treatment, that is our two phrases, have the same impact on the response. Any observed difference is just due to randomness. If we can conclude this is not the case, then we say our results are statistically significant and that there is a difference.
4. Types of error & confidence level
Rejecting the null hypothesis when it is true is called type I error, and retaining the false null hypothesis is type II error.
We define our probability of not making a type I error as the Confidence Level. We will not go into great detail, but intuitively it should make sense that the higher we make this value the larger of a sample we will need. A common value of this is 0 point 95
5. Statistical power
Related to this is the idea of Statistical Power. Power is the probability of finding statistically significant results when the Null hypothesis is false.
6. Connecting the Different Components
Power and Confidence level are connected to the standard error and sensitivity of our test. To estimate our needed sample size, we can choose our desired sensitivity, set our desired confidence level & power, and then estimate our standard error using these values.
7. Power formula
Here is a formula for Power. The details are out of scope for this course. Suffice it to say that the Phi represents the normal distribution function and 'v's our variance.
The key takeaway to note is that the relation between Power and n, our sample size, is that as n goes up so too does our power. Additionally, as our confidence level goes up our power goes down.
8. Sample size function
Here is that function implemented in python, now to solve for n rather than power. Again the details can be explored on your own all that is important understanding the relations between these various values.
9. Calculating our needed sample size
Let us now return to our example and apply this function to find the sample size needed for our test. In the previous chapter we found a baseline conversion rate of 0 point 03468. Let us choose 0 point 95 to be our CL and 0 point 8 to be our desired power. Then plugging these into our sample size function and we can see that to test this with these levels we will need a sample of size 45788 for each group.
10. Generality of this function
In the exercise, you will further explore this function to gain a deeper intuition.
Note that this function is specific to calculations with conversion rates. The functions and calculations for different classes of response variables are analogous and with the knowledge of this case, should be easy to unpack on your own.
11. Decreasing the needed sample size
It is important to note that there are various ways to decrease the needed sample size.
One is by switching the unit of observation in a way that reduces variability in the data such as from revenue to conversion, because you are decreasing the variation of results.
Another way is excluding users who are irrelevant to the process. For example, if we were not excluding users who never saw a paywall then we have a more variable set of users, and thus a higher sample size requirement.
More of these relationships of what impacts our sample size can be explored by thinking hard about the equation relating these various forces to those shown earlier.
12. Let's practice!
Now you are ready to start designing an A/B test. Let’s practice!