Significance of Difference of Proportions
Bike commuting is still uncommon, but Washington, DC, has a decent share. It has increased by over 1 percentage point in the last few years, but is this a statistically significant increase? In this exercise you will calculate the standard error of a proportion, then a two-sample Z-statistic of the proportions.
The formula for the standard error (SE) of a proportion is:
$$SE_P = \frac{1}{N}\sqrt{SE_n^2 - P^2SE_N^2}$$
The formula for the two-sample Z-statistic is:
$$Z = \frac{x_1 - x_2}{\sqrt{SE_{x_1}^2 + SE_{x_2}^2}}$$
The DataFrame dc
is loaded. It has columns (shown in the console) with estimates (ending "_est"
) and margins of error (ending "_moe"
) for total workers and bike commuters.
The sqrt
function has been imported from the numpy
module.
Diese Übung ist Teil des Kurses
Analyzing US Census Data in Python
Anleitung zur Übung
- Calculate
bike_share
by dividing the number of bikers by the total number of workers - Calculate the SE of the estimate of bikers and total workers, by dividing the MOE by
Z_CRIT
- Calculate the SE of the proportions:
se_bike
is the SE of the subpopulation \(SE_n\),bike_share
is the proportion \(P\), andse_total
is the SE of the population \(SE_N\) - Calculate \(Z\): \(x_1\) and \(x_2\) are the
bike_share
in 2017 and 2011; \(SE_{x_1}\) and \(SE_{x_2}\) arese_p
in 2017 and 2011
Interaktive Übung
Versuche dich an dieser Übung, indem du diesen Beispielcode vervollständigst.
# Set the critical Z score for 90% confidence
Z_CRIT = 1.645
# Calculate share of bike commuting
dc["bike_share"] = ____
# Calculate standard errors of the estimate from MOEs
dc["se_bike"] = ____
dc["se_total"] = ____
dc["se_p"] = sqrt(____**2 - ____**2 * ____**2)**0.5 / dc["total_est"]
# Calculate the two sample statistic between 2011 and 2017
Z = (dc[dc["year"] == 2017]["bike_share"] - ____) / \
sqrt(____**2 + ____**2)
print(Z_CRIT < Z)