Get startedGet started for free

Hypothesis testing - Difference of means

We want to test the hypothesis that there is a difference in the average donations received from A and B. Previously, you learned how to generate one permutation of the data. Now, we will generate a null distribution of the difference in means and then calculate the p-value.

For the null distribution, we first generate multiple permuted datasets and store the difference in means for each case. We then calculate the test statistic as the difference in means with the original dataset. Finally, we approximate the p-value by calculating twice the fraction of cases where the difference is greater than or equal to the absolute value of the test statistic (2-sided hypothesis). A p-value of less than say 0.05 could then determine statistical significance.

This exercise is part of the course

Statistical Simulation in Python

View Course

Exercise instructions

  • Generate multiple permutations of donations_A & donations_B & assign it to perm.
  • Set samples equal to the difference in means of permuted_A_datasets & permuted_B_datasets. We set axis=1 to have a mean for each dataset instead of an overall mean.
  • Set test_stat equal to the difference in means of donations_A & donations_B.
  • Calculate p-value p_val as twice the fraction of samples greater than or equal to the absolute value of test_stat.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Generate permutations equal to the number of repetitions
perm = np.array([np.random.____(len(____) + len(____)) for i in range(reps)])
permuted_A_datasets = data[perm[:, :len(donations_A)]]
permuted_B_datasets = data[perm[:, len(donations_A):]]

# Calculate the difference in means for each of the datasets
samples = np.mean(____, axis=1) - np.mean(____, axis=1)

# Calculate the test statistic and p-value
test_stat = ____
p_val = 2*np.sum(____ >= np.abs(____))/reps
print("p-value = {}".format(p_val))
Edit and Run Code