Hypothesis test on Pearson correlation
The observed correlation between female illiteracy and fertility may just be by chance; the fertility of a given country may actually be totally independent of its illiteracy. You will test this hypothesis. To do so, permute the illiteracy values but leave the fertility values fixed. This simulates the hypothesis that they are totally independent of each other. For each permutation, compute the Pearson correlation coefficient and assess how many of your permutation replicates have a Pearson correlation coefficient greater than the observed one.
The function pearson_r()
that you wrote in the prequel to this course for computing the Pearson correlation coefficient is already available for you.
This exercise is part of the course
Statistical Thinking in Python (Part 2)
Exercise instructions
- Compute the observed Pearson correlation between
illiteracy
andfertility
. - Initialize an array to store your permutation replicates.
- Write a
for
loop to draw 10,000 replicates:- Permute the
illiteracy
measurements usingnp.random.permutation()
. - Compute the Pearson correlation between the permuted illiteracy array,
illiteracy_permuted
, andfertility
.
- Permute the
- Compute and print the p-value from the replicates.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Compute observed correlation: r_obs
r_obs = ____
# Initialize permutation replicates: perm_replicates
perm_replicates = np.empty(10000)
# Draw replicates
for ____ in ____:
# Permute illiteracy measurments: illiteracy_permuted
illiteracy_permuted = ____
# Compute Pearson correlation
perm_replicates[i] = ____
# Compute p-value: p
p = ____
print('p-val =', p)