1. Learn
  2. /
  3. Courses
  4. /
  5. Hypothesis Testing in Python

Connected

Exercise

Test of two proportions

You may wonder if the amount paid for freight affects whether or not the shipment was late. Recall that in the late_shipments dataset, whether or not the shipment was late is stored in the late column. Freight costs are stored in the freight_cost_group column, and the categories are "expensive" and "reasonable".

The hypotheses to test, with "late" corresponding to the proportion of late shipments for that group, are

\(H_{0}\): \(late_{\text{expensive}} - late_{\text{reasonable}} = 0\)

\(H_{A}\): \(late_{\text{expensive}} - late_{\text{reasonable}} > 0\)

p_hats contains the estimates of population proportions (sample proportions) for each freight_cost_group:

freight_cost_group  late
expensive           Yes     0.082569
reasonable          Yes     0.035165
Name: late, dtype: float64

ns contains the sample sizes for these groups:

freight_cost_group
expensive     545
reasonable    455
Name: late, dtype: int64

pandas and numpy have been imported under their usual aliases, and norm is available from scipy.stats.

Instructions 1/4

undefined XP
    1
    2
    3
    4
  • Calculate the pooled sample proportion, \(\hat{p}\), from p_hats and ns.

$$ \hat{p} = \frac{n_{\text{expensive}} \times \hat{p}_{\text{expensive}} + n_{\text{reasonable}} \times \hat{p}_{\text{reasonable}}}{n_{\text{expensive}} + n_{\text{reasonable}}} $$