Experimental design: setting up testing parameters

1. Experimental design: setting up testing parameters

Let's talk about setting up experimental design parameters.

2. Distribution parameters

Following up on our checkout page AB test example, recall that the difference between mean purchase rates d follows a normal distribution. Therefore, the null hypothesis which assumes no difference in purchase rates between the groups will be centered around zero, and the alternative hypothesis will be centered around the observed difference between the mean purchase rates with the same standard error of the difference. Our goal is to test whether this difference is unlikely to occur under the Null hypothesis. If it is unlikely, we reject the Null hypothesis and claim that the difference is statistically significant.

3. Design parameters and error types

In order to quantify the likelihood of this difference being meaningful, we need to set up the experimental parameters that improve our chances of detecting meaningful differences. The first parameter is statistical power, which is the probability of detecting an effect of a pre-determined size or higher, if the treatment results in a true difference in metrics. It is mathematically represented as one minus beta where beta is the type 2 or false negative error rate that occurs when we fail to reject the null hypothesis and mistakenly conclude that the treatment has no meaningful effect when in reality it does. Power is commonly set at 80% while sizing the test. Next we have the Minimum Detectable Effect, also known as practical significance level. We set it as the size of the effect of a treatment that we consider significant to the business. In other words, how much do the metrics need to move by for us to make a decision or change our mind about a default action. Any effect under that level would be considered unimportant.

4. Design parameters and error types

The third parameter is the significance level alpha which represents the false positive rate or the probability that we mistakenly reject the Null hypothesis and conclude that the change has a real effect when it doesn't. This is ultimately how we decide if the test result is statistically significant. Recall that the p-value is the probability of obtaining the observed result, or more extreme, under the Null hypothesis. If this probability is lower than our pre-selected significance level, then we conclude that it is unlikely that the observed difference between the groups would occur if the null hypothesis was true, leading to rejecting it.

5. Experiment parameters analogy

Let's solidify these concepts with an analogy. Imagine sending someone to the grocery store to buy a bag of chips. The person comes back and says they couldn't find it. The question we are trying to answer is: if the bag of chips was there, what is the probability that the person would have found it? The answer depends on (1) the time spent at the store, (2) the size of the bag of chips, and (3) how organized the store is. This is the same question we encounter when we run an AB test to determine whether landing page B is better than A in converting customers. The time spent at the store is analogous to the sample size or the duration of the experiment, the size of the bag of chips is similar to the minimum detectable effect that we are trying to capture, and how organized the store is resembles the variance or how spread out the data is. If the person spends a reasonable amount of time trying to find a family-size bag of chips at a well-organized store, there is a much better chance of finding the bag of chips if it is actually there.

6. Let's practice!

Let's look at some exercises.

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.