CommencerCommencer gratuitement

Hypothesis testing - Difference of means

We want to test the hypothesis that there is a difference in the average donations received from A and B. Previously, you learned how to generate one permutation of the data. Now, we will generate a null distribution of the difference in means and then calculate the p-value.

For the null distribution, we first generate multiple permuted datasets and store the difference in means for each case. We then calculate the test statistic as the difference in means with the original dataset. Finally, we approximate the p-value by calculating twice the fraction of cases where the difference is greater than or equal to the absolute value of the test statistic (2-sided hypothesis). A p-value of less than say 0.05 could then determine statistical significance.

Cet exercice fait partie du cours

Statistical Simulation in Python

Afficher le cours

Instructions

  • Generate multiple permutations of donations_A & donations_B & assign it to perm.
  • Set samples equal to the difference in means of permuted_A_datasets & permuted_B_datasets. We set axis=1 to have a mean for each dataset instead of an overall mean.
  • Set test_stat equal to the difference in means of donations_A & donations_B.
  • Calculate p-value p_val as twice the fraction of samples greater than or equal to the absolute value of test_stat.

Exercice interactif pratique

Essayez cet exercice en complétant cet exemple de code.

# Generate permutations equal to the number of repetitions
perm = np.array([np.random.____(len(____) + len(____)) for i in range(reps)])
permuted_A_datasets = data[perm[:, :len(donations_A)]]
permuted_B_datasets = data[perm[:, len(donations_A):]]

# Calculate the difference in means for each of the datasets
samples = np.mean(____, axis=1) - np.mean(____, axis=1)

# Calculate the test statistic and p-value
test_stat = ____
p_val = 2*np.sum(____ >= np.abs(____))/reps
print("p-value = {}".format(p_val))
Modifier et exécuter le code