Annotating confidence intervals
Your data science work with pollution data is legendary, and you are now weighing job offers in both Cincinnati, Ohio and Indianapolis, Indiana. You want to see if the SO2 levels are significantly different in the two cities, and more specifically, which city has lower levels. To test this, you decide to look at the differences in the cities' SO2 values (Indianapolis' -
Cincinnati's) over multiple years (provided as diffs_by_year
).
Instead of just displaying a p-value for a significant difference between the cities, you decide to look at the 95% confidence intervals (columns lower
and upper
) of the differences. This allows you to see the magnitude of the differences along with any trends over the years.
Cet exercice fait partie du cours
Improving Your Data Visualizations in Python
Instructions
- Provide starting and ending limits (columns
lower
andupper
) for your confidence intervals toplt.hlines()
. - Set interval thickness to
5
. - Draw a vertical line representing a difference of
0
withplt.axvline()
. - Color the null line
'orangered'
to make it stand out.
Exercice interactif pratique
Essayez cet exercice en complétant cet exemple de code.
# Set start and ends according to intervals
# Make intervals thicker
plt.hlines(y = 'year', xmin = '____', xmax = '____',
linewidth = ____, color = 'steelblue', alpha = 0.7,
data = diffs_by_year)
# Point estimates
plt.plot('mean', 'year', 'k|', data = diffs_by_year)
# Add a 'null' reference line at 0 and color orangered
plt.axvline(x = ____, color = '____', linestyle = '--')
# Set descriptive axis labels and title
plt.xlabel('95% CI')
plt.title('Avg SO2 differences between Cincinnati and Indianapolis')
plt.show()