Significance of Difference of Proportions
Bike commuting is still uncommon, but Washington, DC, has a decent share. It has increased by over 1 percentage point in the last few years, but is this a statistically significant increase? In this exercise you will calculate the standard error of a proportion, then a two-sample Z-statistic of the proportions.
The formula for the standard error (SE) of a proportion is:
$$SE_P = \frac{1}{N}\sqrt{SE_n^2 - P^2SE_N^2}$$
The formula for the two-sample Z-statistic is:
$$Z = \frac{x_1 - x_2}{\sqrt{SE_{x_1}^2 + SE_{x_2}^2}}$$
The DataFrame dc is loaded. It has columns (shown in the console) with estimates (ending "_est") and margins of error (ending "_moe") for total workers and bike commuters.
The sqrt function has been imported from the numpy module.
Questo esercizio fa parte del corso
Analyzing US Census Data in Python
Istruzioni dell'esercizio
- Calculate
bike_shareby dividing the number of bikers by the total number of workers - Calculate the SE of the estimate of bikers and total workers, by dividing the MOE by
Z_CRIT - Calculate the SE of the proportions:
se_bikeis the SE of the subpopulation \(SE_n\),bike_shareis the proportion \(P\), andse_totalis the SE of the population \(SE_N\) - Calculate \(Z\): \(x_1\) and \(x_2\) are the
bike_sharein 2017 and 2011; \(SE_{x_1}\) and \(SE_{x_2}\) arese_pin 2017 and 2011
Esercizio pratico interattivo
Prova a risolvere questo esercizio completando il codice di esempio.
# Set the critical Z score for 90% confidence
Z_CRIT = 1.645
# Calculate share of bike commuting
dc["bike_share"] = ____
# Calculate standard errors of the estimate from MOEs
dc["se_bike"] = ____
dc["se_total"] = ____
dc["se_p"] = sqrt(____**2 - ____**2 * ____**2)**0.5 / dc["total_est"]
# Calculate the two sample statistic between 2011 and 2017
Z = (dc[dc["year"] == 2017]["bike_share"] - ____) / \
sqrt(____**2 + ____**2)
print(Z_CRIT < Z)