Get Started

Generating a permutation sample

In the video, you learned that permutation sampling is a great way to simulate the hypothesis that two variables have identical probability distributions. This is often a hypothesis you want to test, so in this exercise, you will write a function to generate a permutation sample from two data sets.

Remember, a permutation sample of two arrays having respectively n1 and n2 entries is constructed by concatenating the arrays together, scrambling the contents of the concatenated array, and then taking the first n1 entries as the permutation sample of the first array and the last n2 entries as the permutation sample of the second array.

This is a part of the course

“Statistical Thinking in Python (Part 2)”

View Course

Exercise instructions

  • Concatenate the two input arrays into one using np.concatenate(). Be sure to pass in data1 and data2 as one argument (data1, data2).
  • Use np.random.permutation() to permute the concatenated array.
  • Store the first len(data1) entries of permuted_data as perm_sample_1 and the last len(data2) entries of permuted_data as perm_sample_2. In practice, this can be achieved by using :len(data1) and len(data1): to slice permuted_data.
  • Return perm_sample_1 and perm_sample_2.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

def permutation_sample(data1, data2):
    """Generate a permutation sample from two data sets."""

    # Concatenate the data sets: data
    data = ____

    # Permute the concatenated array: permuted_data
    permuted_data = ____

    # Split the permuted array into two: perm_sample_1, perm_sample_2
    perm_sample_1 = permuted_data[____]
    perm_sample_2 = permuted_data[____]

    return perm_sample_1, perm_sample_2

This exercise is part of the course

Statistical Thinking in Python (Part 2)

IntermediateSkill Level
4.6+
16 reviews

Learn to perform the two key tasks in statistical inference: parameter estimation and hypothesis testing.

You now know how to define and estimate parameters given a model. But the question remains: how reasonable is it to observe your data if a model is true? This question is addressed by hypothesis tests. They are the icing on the inference cake. After completing this chapter, you will be able to carefully construct and test hypotheses using hacker statistics.

Exercise 1: Formulating and simulating a hypothesisExercise 2: Generating a permutation sample
Exercise 3: Visualizing permutation samplingExercise 4: Test statistics and p-valuesExercise 5: Test statisticsExercise 6: What is a p-value?Exercise 7: Generating permutation replicatesExercise 8: Look before you leap: EDA before hypothesis testingExercise 9: Permutation test on frog dataExercise 10: Bootstrap hypothesis testsExercise 11: A one-sample bootstrap hypothesis testExercise 12: A two-sample bootstrap hypothesis test for difference of means

What is DataCamp?

Learn the data skills you need online at your own pace—from non-coding essentials to data science and machine learning.

Start Learning for Free