1. A/B testing
Welcome back! Let's apply what you have learned so far to A/B testing!
2. A/B testing
A/B testing is a method for assessing user experience based on a randomized experiment, in which we divide our users into two groups.
3. A/B testing
We expose each group to a different version of something, for instance, we show each group a different version of a website layout.
4. A/B testing
Then, we compare the two groups based on some metric, such as which website version generated a higher click-through rate. Such an A/B test allows us to choose the better version of the website that would subsequently be shown to all users.
5. A/B testing: frequentist way
The typical, frequentist approach to A/B testing is based on a statistical procedure known as hypothesis testing. The main drawback of this approach is that we can only conclude which group is better, but not how much better it is.
6. A/B testing: Bayesian approach
Alternatively, the Bayesian approach allows us to calculate the posterior click-through rates for websites A and B, compare them directly, and calculate the probability that one is better than the other. We can also quantify how much better it is, and even estimate the expected loss in case we make a wrong decision and deploy the worse website version.
7. A/B testing: Bayesian approach
You already know how to set up Bayesian A/B testing. To model whether a user clicks or doesn't click you can use the binomial distribution, with a click being a success, and the click rate being the probability of success.
8. Simulate beta posterior
We've seen that for binomial data, a beta prior would generate a beta posterior according to these formulas which you have already seen before. This allows us to simply sample the posterior draws from the appropriate beta distribution.
Here is a custom function that you will use for this, called simulate_beta_posterior. It implements the formulas above, and it's the same as the get_heads_prob function you have used before. The only difference is that next to the 0-1 data, you can pass the two beta prior parameters as arguments. As a result, you get 10000 posterior draws, just like before.
9. Comparing posteriors
Imagine you have a list of 1s (clicks) and 0s (no-clicks) based on the website traffic for two website layouts: A and B. You can use the simulate_beta_posterior function to simulate posterior draws. Here, we are using a beta-1-1 prior. We can plot the two posteriors to see that B seems to be better, although the two overlap.
10. Comparing posteriors
We can subtract one from the other to calculate the posterior difference between click rates. It's very likely to be positive, which corresponds to B being better.
To get the explicit probability of B being better than A, we can create a Boolean array that is True when B is better and False otherwise, and compute its mean. Here, there is a 96% probability that the B website layout is better!
11. Expected loss
We can also estimate the expected loss resulting from accidentally deploying a worse version. First, we slice the difference between the two posteriors to take only the rare cases where A is better. This is our loss. Then, we take the average to get the expected loss.
If we deploy version B, which we know is better with 96% probability, but the 4% risk materializes and it turns out A was better, we will only lose 0-point-7 percentage points in the click-through rate.
12. Ads data
In this chapter, you will work with ads data adapted from Kaggle which contains information on whether ad banners of different products displayed on different site versions were clicked or not.
13. Let's A/B test!
Let's A/B test!