Get startedGet started for free

Hypothesis test on Pearson correlation

The observed correlation between female illiteracy and fertility may just be by chance; the fertility of a given country may actually be totally independent of its illiteracy. You will test this hypothesis. To do so, permute the illiteracy values but leave the fertility values fixed. This simulates the hypothesis that they are totally independent of each other. For each permutation, compute the Pearson correlation coefficient and assess how many of your permutation replicates have a Pearson correlation coefficient greater than the observed one.

The function pearson_r() that you wrote in the prequel to this course for computing the Pearson correlation coefficient is already available for you.

This exercise is part of the course

Statistical Thinking in Python (Part 2)

View Course

Exercise instructions

  • Compute the observed Pearson correlation between illiteracy and fertility.
  • Initialize an array to store your permutation replicates.
  • Write a for loop to draw 10,000 replicates:
    • Permute the illiteracy measurements using np.random.permutation().
    • Compute the Pearson correlation between the permuted illiteracy array, illiteracy_permuted, and fertility.
  • Compute and print the p-value from the replicates.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Compute observed correlation: r_obs
r_obs = ____

# Initialize permutation replicates: perm_replicates
perm_replicates = np.empty(10000)

# Draw replicates
for ____ in ____:
    # Permute illiteracy measurments: illiteracy_permuted
    illiteracy_permuted = ____

    # Compute Pearson correlation
    perm_replicates[i] = ____

# Compute p-value: p
p = ____
print('p-val =', p)
Edit and Run Code