Get startedGet started for free

Visualizing permutation sampling

To help see how permutation sampling works, in this exercise you will generate permutation samples and look at them graphically.

We will use the Sheffield Weather Station data again, this time considering the monthly rainfall in July (a dry month) and November (a wet month). We expect these might be differently distributed, so we will take permutation samples to see how their ECDFs would look if they were identically distributed.

The data are stored in the Numpy arrays rain_july and rain_november.

As a reminder, permutation_sample() has a function signature of permutation_sample(data_1, data_2) with a return value of permuted_data[:len(data_1)], permuted_data[len(data_1):], where permuted_data = np.random.permutation(np.concatenate((data_1, data_2))).

This exercise is part of the course

Statistical Thinking in Python (Part 2)

View Course

Exercise instructions

  • Write a for loop to 50 generate permutation samples, compute their ECDFs, and plot them.
    • Generate a permutation sample pair from rain_july and rain_november using your permutation_sample() function.
    • Generate the x and y values for an ECDF for each of the two permutation samples for the ECDF using your ecdf() function.
    • Plot the ECDF of the first permutation sample as dots using the color='red and alpha=0.02 keyword arguments. Do the same for the second permutation sample using the color='blue' and alpha=0.02 keyword arguments.
  • Generate x and y values for ECDFs for the rain_july and rain_november data and plot the ECDFs using respectively the keyword arguments color=red and color='blue'.
  • Label your axes, set a 2% margin, and show your plot.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

for _ in ____:
    # Generate permutation samples
    perm_sample_1, perm_sample_2 = ____


    # Compute ECDFs
    x_1, y_1 = ____
    x_2, y_2 = ____

    # Plot ECDFs of permutation sample
    _ = plt.plot(____, ____, marker='.', linestyle='none',
                 color='red', alpha=0.02)
    _ = plt.plot(____, ____, marker='.', linestyle='none',
                 color='blue', alpha=0.02)

# Create and plot ECDFs from original data
x_1, y_1 = ____
x_2, y_2 = ____
_ = plt.plot(x_1, y_1, marker='.', linestyle='none', color='red')
_ = plt.plot(x_2, y_2, marker='.', linestyle='none', color='blue')

# Label axes, set margin, and show plot
plt.margins(0.02)
_ = plt.xlabel('monthly rainfall (mm)')
_ = plt.ylabel('ECDF')
plt.show()
Edit and Run Code