Null Hypothesis
In this exercise, we formulate the null hypothesis as
short and long time durations have no effect on total distance traveled.
We interpret the "zero effect size" to mean that if we shuffled samples between short and long times, so that two new samples each have a mix of short and long duration trips, and then compute the test statistic, on average it will be zero.
In this exercise, your goal is to perform the shuffling and resampling. Start with the predefined group_duration_short
and group_duration_long
which are the un-shuffled time duration groups.
This exercise is part of the course
Introduction to Linear Modeling in Python
Exercise instructions
- Use
np.concatenate()
to combine the two populations, and then usenp.random.shuffle()
to shuffle the values inside that container. - Slice the
shuffle_bucket
in half and usenp.random.choice()
to resample eachshuffle_half
. - Compute the
test_statistic
by subtractingresample_half1
fromresample_half2
. - Compute the
effect_size
as thenp.mean()
of thetest_statistic
, and print the result.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Shuffle the time-ordered distances, then slice the result into two populations.
shuffle_bucket = np.____((group_duration_short, group_duration_long))
np.random.shuffle(____)
slice_index = len(shuffle_bucket)//2
shuffled_half1 = shuffle_bucket[0:____]
shuffled_half2 = shuffle_bucket[____:]
# Create new samples from each shuffled population, and compute the test statistic
resample_half1 = np.random.choice(____, size=500, replace=____)
resample_half2 = np.random.choice(____, size=500, replace=____)
test_statistic = ____ - ____
# Compute and print the effect size
effect_size = np.mean(____)
print('Test Statistic, after shuffling, mean = {}'.format(____))