Get startedGet started for free

Considerations in A/B testing

1. Considerations in A/B testing

Welcome to considerations in AB testing.

2. A/B testing considerations

An AB test may be appropriate provided meaningful traffic or subjects can be gathered, we have enough time to design and carry out the tests, and there is a clear hypothesis. Aspects to be aware of in AB tests include data fluctuations, the number of variables, and regression to the mean. Let's cover critical AB considerations for ensuring accurate and reliable results.

3. Fluctuations in data

A statistical representation of the whole population is needed to estimate sample results accurately. Fluctuations in external factors can impact results, such as changes in subjects, days of the week, holidays, or public regard, influenced by factors such as news or social media. For example, returning website visitors may initially engage less with a change but may draw new visitors with previous visitors returning later. Habit-altering holidays such as Winter holidays may not properly represent the year. Collection ending during a sale will be solely representative of the sale. Be aware of events that may influence the hypothesis conclusion.

4. Example of fluctuations

Fluctuations of data collected on the sample can be seen here using the sample function. Taking three random samples of ten subjects alters the results. It is important to note these potential impacts on results, especially when assessing findings.

5. Number of variables

For the most accuracy, assess one variable in AB tests as additional variables may alter conclusions. Suppose subjects assess the enjoyment of a two-topping pizza variable of bell pepper and onion or olive and garlic. Whereas the one-topping variable assesses individual toppings, specifically pepperoni compared to cheese, the two-topping variable assesses topping pairings, not an ingredient. The two-topping option also has no control to compare to.

6. Variables and type I error

More variables mean more analyses, increasing the likelihood of a Type I error, or false positive. A common significance level is five percent, or alpha of point-zero-five, meaning there is a five-percent chance of receiving a Type I error. The confidence level, one-hundred minus the significance level, can be used to calculate the overall Type I error for the tests, or family-wise error rate, by subtracting the probability of no false positives from one. For example, a five-percent significance level gives a confidence level of ninety-five-percent. If ten tests are run, the probability of no false positives is point-ninety-five to the power of ten subtracted from one resulting in a forty-percent chance that one or more Type I errors will be made.

7. Regression to the mean

Type I error can also be influenced by regression to the mean, in which a variable may be more extreme in early or less measurements, and trend to the average with additional data. For example, if a submit button is made larger, it can receive more attention at first, but this may regress when the size change is no longer novel to visitors, resulting in a Type I error if proper time is not permitted for data collection. To aid in assessing this, one AB group should be a control, the original submit button, to compare the change group to.

8. Regression to the mean

Small sample sizes may also derive a Type I error. More data becomes increasingly representative of the true mean. For example, in assessing enjoyment of pizza, friend one may not enjoy the cheese pizza but others may love it, altering the initial data from a low to an accurate mean.

9. Let's practice!

Let's practice.