Test of two proportions
You may wonder if the amount paid for freight affects whether or not the shipment was late. Recall that in the late_shipments
dataset, whether or not the shipment was late is stored in the late
column. Freight costs are stored in the freight_cost_group
column, and the categories are "expensive"
and "reasonable"
.
The hypotheses to test, with "late"
corresponding to the proportion of late shipments for that group, are
\(H_{0}\): \(late_{\text{expensive}} - late_{\text{reasonable}} = 0\)
\(H_{A}\): \(late_{\text{expensive}} - late_{\text{reasonable}} > 0\)
p_hats
contains the estimates of population proportions (sample proportions) for each freight_cost_group
:
freight_cost_group late
expensive Yes 0.082569
reasonable Yes 0.035165
Name: late, dtype: float64
ns
contains the sample sizes for these groups:
freight_cost_group
expensive 545
reasonable 455
Name: late, dtype: int64
pandas
and numpy
have been imported under their usual aliases, and norm
is available from scipy.stats
.
This exercise is part of the course
Hypothesis Testing in Python
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Calculate the pooled estimate of the population proportion
p_hat = ____
# Print the result
print(p_hat)