Hypothesis testing - Difference of means
We want to test the hypothesis that there is a difference in the average donations received from A and B. Previously, you learned how to generate one permutation of the data. Now, we will generate a null distribution of the difference in means and then calculate the p-value.
For the null distribution, we first generate multiple permuted datasets and store the difference in means for each case. We then calculate the test statistic as the difference in means with the original dataset. Finally, we approximate the p-value by calculating twice the fraction of cases where the difference is greater than or equal to the absolute value of the test statistic (2-sided hypothesis). A p-value of less than say 0.05 could then determine statistical significance.
This exercise is part of the course
Statistical Simulation in Python
Exercise instructions
- Generate multiple permutations of
donations_A
&donations_B
& assign it toperm
. - Set
samples
equal to the difference in means ofpermuted_A_datasets
&permuted_B_datasets
. We setaxis=1
to have a mean for each dataset instead of an overall mean. - Set
test_stat
equal to the difference in means ofdonations_A
&donations_B
. - Calculate p-value
p_val
as twice the fraction ofsamples
greater than or equal to the absolute value oftest_stat
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Generate permutations equal to the number of repetitions
perm = np.array([np.random.____(len(____) + len(____)) for i in range(reps)])
permuted_A_datasets = data[perm[:, :len(donations_A)]]
permuted_B_datasets = data[perm[:, len(donations_A):]]
# Calculate the difference in means for each of the datasets
samples = np.mean(____, axis=1) - np.mean(____, axis=1)
# Calculate the test statistic and p-value
test_stat = ____
p_val = 2*np.sum(____ >= np.abs(____))/reps
print("p-value = {}".format(p_val))