Wilcoxon-Mann-Whitney
Another class of non-parametric hypothesis tests are called rank sum tests. Ranks are the positions of numeric values from smallest to largest. Think of them as positions in running events: whoever has the fastest (smallest) time is rank 1, second fastest is rank 2, and so on.
By calculating on the ranks of data instead of the actual values, you can avoid making assumptions about the distribution of the test statistic. It's more robust in the same way that a median is more robust than a mean.
One common rank-based test is the Wilcoxon-Mann-Whitney test, which is like a non-parametric t-test.
late_shipments
is available, and the following packages have been loaded: pingouin
and pandas
as pd
.
This exercise is part of the course
Hypothesis Testing in Python
Exercise instructions
- Select
weight_kilograms
andlate
fromlate_shipments
, assigning the nameweight_vs_late
. - Convert
weight_vs_late
from long-to-wide format, settingcolumns
to'late'
. - Run a Wilcoxon-Mann-Whitney test for a difference in
weight_kilograms
when the shipment was late and on-time.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Select the weight_kilograms and late columns
weight_vs_late = ____
# Convert weight_vs_late into wide format
weight_vs_late_wide = weight_vs_late.pivot(columns=____,
values=____)
# Run a two-sided Wilcoxon-Mann-Whitney test on weight_kilograms vs. late
wmw_test = ____
# Print the test results
print(wmw_test)