Get startedGet started for free

Power and sample size

1. Power and sample size

In the last lesson, we discussed analyzing the results of a hypothesis test. Now we'll take things a step further by looking at another popular interview question: how do you calculate needed sample size?

2. Power analysis

Let's provide a little context before we dive in. If you run an experiment, how do you decide how long it should run, or in our terms: how many observations are needed per group? This question is relevant because it's normally advised that you decide on a sample size before you start an experiment. Despite what you may read in many guides to A/B testing, there's no good general guidance here - as is often the case, the answer is: it depends. In practice, the approach we'll use to solve this problem is often referred to as power analysis.

3. Moving parts

First, you'll need to know the minimum size of the effect that you want to detect in a test; for example, a 20 percent improvement. Next is the significance level at which the test will be conducted, commonly referred to as the alpha value. Finally, power is the probability of detecting an effect. If we see something interesting, we want to make sure that we have enough power to conclude with high probability that the result is statistically significant. If we alter any of these parameters, the needed sample size changes. More power, a smaller significance level, or detecting a smaller effect all lead to a larger sample size. The plot-under-score power function in python does a good job of visualizing this phenomenon.

4. Calculating sample size

We'll be using a couple of functions within the statsmodel dot stats dot power packages, which take in all but one of the aforementioned components and then calculate the remaining parameter. One preliminary step must be taken. The power functions above require standardized minimum effect difference. To get this, we can use the proportion-underscore-effectsize function by inputting our baseline and desired minimum conversion rates.

5. Example: conversion rates

Let's look at an example. Since both functions work similarly, we'll focus on a z-test scenario where we are looking at conversion rates. We'll set power to 80 percent, significance at 5 percent, and the minimum effect size at 5 percent as well. We compute the standard effect size and, once we run zt-underscore-ind-underscore-solve-underscore-power, we get our desired sample of around 1091 impressions.

6. Example: conversion rates

Furthermore, we can play around with the input values for the same example and get a different result. For instance, if we raise the power to point 95, as shown in the final solve power command, we require nearly 800 more observations, since power and sample size are inversely related. Interviewers will often tweak the problem after you've already solved it initially, to try to get a better sense of how you think; this is a prime example of that.

7. Summary

In this lesson, we touched on a popular interview question and how to solve it using power analysis. We looked at all of the moving parts that go into an a/b test. Lastly, we walked through an example in python.

8. Let's prepare for the interview!

Let's keep chugging along and head to the exercises!