1. Non-parametric statistical tests
For every statistical test we ran so far, we had to make certain assumptions. We call those tests parametric tests. In this video we will explore the need for using non-parametric tests that do not require making strict assumptions about the data, and go over how to use them for AB testing.
2. Parametric tests assumptions
There are three main assumptions that we have to make for parametric tests to hold. First, the experimental units have to be randomly sampled from the larger population.
Also, each row in our data needs to be independent of the other. Except for specific scenarios where we know the data is paired and can use adequate tests such as a paired t-test to account for it, our observations need to be independent.
Finally, the sampled metric's data needs to be normally distributed so that the central limit theorem would apply.
Recall that for the theorem to hold, either the underlying distribution of the data needs to be normal, or the sample size needs to be big enough. A good rule of thumb is that for two sample t-test we need at least 30 observations in each group, and for a two sample test for proportions, we need at least ten successes and ten failures for each group.
3. Mann-Whitney U test
In cases where we don't know enough about the data to validate the parametric tests assumptions or when we are explicitly in violation of the assumptions, we resort to non-parametric tests.
One example is the Mann-Whitney U test, which is a statistical significance test used to determine if two independent samples were drawn from a population with the same distribution. This test belongs to a class of tests called rank sum tests since it uses the ranks of the data. The samples also have to be unpaired.
4. Mann-Whitney U test in python
Calculating the mean time on page and counts per variant of the checkout dataset, we can see that each group has 3000 rows. To intentionally violate one of the parametric tests assumption, let's only sample 25 rows out of groups A and B. Assuming the data is non-normal, now that the sample size is less than 30, we cannot use a parametric test.
5. Mann-Whitney U test in python
To run the non-parametric Mann-Whitney U test instead, we leverage the pingouin dot mwu function which takes the data from the two columns that we are comparing the distributions of. We run the test and get a p-value lower than the significance threshold of 5% suggesting that we can reject the Null hypothesis that the data from groups A and B came from the same distribution.
6. Chi-square test of independence
Another non-parametric test that does not require assumptions about the population parameters is the Chi-square test for independence. It tests whether categorical variables are independent or not. The Null hypothesis states independence while the alternative states the opposite.
7. Chi-square test in python
In the homepage design AB test, the Null hypothesis is that there is no significant difference in signup rates between the landing pages. In other words, the signup rates are independent of landing page variants. The alternative hypothesis states the opposite. To run the chi square test, we calculate the number of users that signed up and those that didn't in each group.
8. Chi-square test in python
We then create a table for the control and treatment numbers and pass it to scipy's chi square contingency function and calculate the p_value which is the second element in the output with index one. The results show that group C has a higher signup rate and the p-value is lower than the 5% significance threshold indicating a significant difference.
9. Let's practice!
With that, let's practice some non-parametric test scenarios.