Generating a permutation sample
In the video, you learned that permutation sampling is a great way to simulate the hypothesis that two variables have identical probability distributions. This is often a hypothesis you want to test, so in this exercise, you will write a function to generate a permutation sample from two data sets.
Remember, a permutation sample of two arrays having respectively n1
and n2
entries is constructed by concatenating the arrays together, scrambling the contents of the concatenated array, and then taking the first n1
entries as the permutation sample of the first array and the last n2
entries as the permutation sample of the second array.
This is a part of the course
“Statistical Thinking in Python (Part 2)”
Exercise instructions
- Concatenate the two input arrays into one using
np.concatenate()
. Be sure to pass indata1
anddata2
as one argument(data1, data2)
. - Use
np.random.permutation()
to permute the concatenated array. - Store the first
len(data1)
entries ofpermuted_data
asperm_sample_1
and the lastlen(data2)
entries ofpermuted_data
asperm_sample_2
. In practice, this can be achieved by using:len(data1)
andlen(data1):
to slicepermuted_data
. - Return
perm_sample_1
andperm_sample_2
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
def permutation_sample(data1, data2):
"""Generate a permutation sample from two data sets."""
# Concatenate the data sets: data
data = ____
# Permute the concatenated array: permuted_data
permuted_data = ____
# Split the permuted array into two: perm_sample_1, perm_sample_2
perm_sample_1 = permuted_data[____]
perm_sample_2 = permuted_data[____]
return perm_sample_1, perm_sample_2