Visualizing Test Statistics
In this exercise, you will approach the null hypothesis by comparing the distribution of a test statistic arrived at from two different ways.
First, you will examine two "populations", grouped by early and late times, and computing the test statistic distribution. Second, shuffle the two populations, so the data is no longer time ordered, and each has a mix of early and late times, and then recompute the test statistic distribution.
To get you started, we've pre-loaded the two time duration groups, group_duration_short
and group_duration_long
, and two functions, shuffle_and_split()
and plot_test_statistic()
.
This exercise is part of the course
Introduction to Linear Modeling in Python
Exercise instructions
- Use
np.random.choice()
to resamplegroup_duration_short
andgroup_duration_long
, and difference the resamples to compute thetest_statistic_unshuffled
. - Use
shuffle_and_split()
on the originalgroup_duration_short
andgroup_duration_long
(specified in this order) to create two new mixed populations. - Resample the shuffled populations, and subtract
resample_short
fromresample_long
to compute a newtest_statistic_shuffled
. - Use
plot_test_statistic()
to plot both test statistic distributions, and compare visually.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# From the unshuffled groups, compute the test statistic distribution
resample_short = np.random.choice(____, size=500, replace=____)
resample_long = np.random.choice(____, size=500, replace=____)
test_statistic_unshuffled = ____ - ____
# Shuffle two populations, cut in half, and recompute the test statistic
shuffled_half1, shuffled_half2 = shuffle_and_split(____, ____)
resample_half1 = np.random.choice(____, size=500, replace=____)
resample_half2 = np.random.choice(____, size=500, replace=____)
test_statistic_shuffled = resample_half2 - resample_half1
# Plot both the unshuffled and shuffled results and compare
fig = plot_test_statistic(____, label='Unshuffled')
fig = plot_test_statistic(____, label='Shuffled')