Get startedGet started for free

Analyzing difference in proportions A/B tests

1. Analyzing difference in proportions A/B tests

We will now talk about analyzing the difference in proportion metrics.

2. Framework for difference in proportions

Continuing with the checkout page design AB test, we previously calculated the sample size needed per group to detect an absolute purchase rate difference of around two and a half percent to be approximately 3000 users. We ran the test and after performing sanity checks, it is time to start analyzing our results. To evaluate whether the difference in the mean purchase rates between groups A and B is statistically significant, we need to perform a two sample z-test for proportions using statsmodels proportions z-test function. This allows us to calculate the p-value of the test. If the calculated p-value is lower than the significance threshold of five percent, we reject the Null hypothesis and conclude that the treatment effect is statistically significant. This however does not tell us what to expect the new purchase rate to be. To get a estimate for it, we lean on the concept of confidence intervals. Claiming that the calculated metric in the sample data is the true expected treatment effect in the population is like fishing with a spear in a murky lake. Relying on confidence intervals, on the other hand, is like using a net. There's a higher chance of missing the fish if we throw the spear, but a net will more likely capture the wider area containing the fish. A 95% confidence interval is the range of plausible values that captures the true difference 95% of the time, and is centered around the observed difference between the treatment and the control.

3. Two sample proportions z-test

Let's make our imports from statsmodels library. Recall that the proportions z-test function takes two objects as numpy arrays: the number of successes, and the number of rows in each group. Since our test was randomized at the user level, we get the number of unique user ids in each group using nunique method, as well as the number of unique users who made at least one purchase and assign them to the respective lists.

4. Two sample proportions z-test

Finally, we pass the lists to the proportions ztest and confint functions with the default significance level of five percent, which corresponds to a ninety five percent confidence level. Since the p-value is less than our significance threshold, we reject the null hypothesis that the two groups have equal purchase rates. Looking at the confidence interval of group B, we conclude that if we roll out design B of the checkout page, we can expect our purchase rate to lie somewhere between eighty three point five and eighty six percent with ninety five percent confidence.

5. Confidence intervals for proportions

It is important to understand that the confidence level is a statement about the long-term probability of the intervals capturing the true population parameter. Let's assume that checkout page B represents our whole population of users landing on this checkout page. We calculate this population purchase rate to be zero point eight four seven using the dot mean method.

6. Confidence intervals for proportions

We then create a for loop to take 20 different samples of size 100 from group B using the sample method, sum the total number of purchases, and pass it as the first argument in the proportion_confint function. We pass the size of the sample representing the number of trials, and alpha corresponding to 90% confidence level to generate 20 90% confidence intervals. Notice that two out of the 20, or 10% of intervals do not capture our true rate of point eight four seven. Which is expected given our chosen alpha of 10%.

7. Let's practice!

Let's solidify our knowledge with some exercises.