Visualizing permutation sampling

To help see how permutation sampling works, in this exercise you will generate permutation samples and look at them graphically.

We will use the Sheffield Weather Station data again, this time considering the monthly rainfall in June (a dry month) and November (a wet month). We expect these might be differently distributed, so we will take permutation samples to see how their ECDFs would look if they were identically distributed.

The data are stored in the NumPy arrays rain_june and rain_november.

As a reminder, permutation_sample() has a function signature of permutation_sample(data_1, data_2) with a return value of permuted_data[:len(data_1)], permuted_data[len(data_1):], where permuted_data = np.random.permutation(np.concatenate((data_1, data_2))).

Este exercício faz parte do curso

Statistical Thinking in Python (Part 2)

Ver Curso

Instruções de exercício

  • Write a for loop to generate 50 permutation samples, compute their ECDFs, and plot them.
    • Generate a permutation sample pair from rain_june and rain_november using your permutation_sample() function.
    • Generate the x and y values for an ECDF for each of the two permutation samples for the ECDF using your ecdf() function.
    • Plot the ECDF of the first permutation sample (x_1 and y_1) as dots. Do the same for the second permutation sample (x_2 and y_2).
  • Generate x and y values for ECDFs for the rain_june and rain_november data and plot the ECDFs using respectively the keyword arguments color='red' and color='blue'.
  • Label your axes, set a 2% margin, and show your plot. This has been done for you, so just hit submit to view the plot!

Exercício interativo prático

Experimente este exercício preenchendo este código de exemplo.

for _ in ____:
    # Generate permutation samples
    perm_sample_1, perm_sample_2 = ____


    # Compute ECDFs
    x_1, y_1 = ____
    x_2, y_2 = ____

    # Plot ECDFs of permutation sample
    _ = plt.plot(____, ____, marker='.', linestyle='none',
                 color='red', alpha=0.02)
    _ = plt.plot(____, ____, marker='.', linestyle='none',
                 color='blue', alpha=0.02)

# Create and plot ECDFs from original data
x_1, y_1 = ____
x_2, y_2 = ____
_ = plt.plot(x_1, y_1, marker='.', linestyle='none', color='red')
_ = plt.plot(x_2, y_2, marker='.', linestyle='none', color='blue')

# Label axes, set margin, and show plot
plt.margins(0.02)
_ = plt.xlabel('monthly rainfall (mm)')
_ = plt.ylabel('ECDF')
plt.show()