Visualizing Test Statistics
In this exercise, you will approach the null hypothesis by comparing the distribution of a test statistic arrived at from two different ways.
First, you will examine two "populations", grouped by early and late times, and computing the test statistic distribution. Second, shuffle the two populations, so the data is no longer time ordered, and each has a mix of early and late times, and then recompute the test statistic distribution.
To get you started, we've pre-loaded the two time duration groups, group_duration_short and group_duration_long, and two functions, shuffle_and_split() and plot_test_statistic().
Cet exercice fait partie du cours
Introduction to Linear Modeling in Python
Instructions
- Use
np.random.choice()to resamplegroup_duration_shortandgroup_duration_long, and difference the resamples to compute thetest_statistic_unshuffled. - Use
shuffle_and_split()on the originalgroup_duration_shortandgroup_duration_long(specified in this order) to create two new mixed populations. - Resample the shuffled populations, and subtract
resample_shortfromresample_longto compute a newtest_statistic_shuffled. - Use
plot_test_statistic()to plot both test statistic distributions, and compare visually.
Exercice interactif pratique
Essayez cet exercice en complétant cet exemple de code.
# From the unshuffled groups, compute the test statistic distribution
resample_short = np.random.choice(____, size=500, replace=____)
resample_long = np.random.choice(____, size=500, replace=____)
test_statistic_unshuffled = ____ - ____
# Shuffle two populations, cut in half, and recompute the test statistic
shuffled_half1, shuffled_half2 = shuffle_and_split(____, ____)
resample_half1 = np.random.choice(____, size=500, replace=____)
resample_half2 = np.random.choice(____, size=500, replace=____)
test_statistic_shuffled = resample_half2 - resample_half1
# Plot both the unshuffled and shuffled results and compare
fig = plot_test_statistic(____, label='Unshuffled')
fig = plot_test_statistic(____, label='Shuffled')