Annotating confidence intervals
Your data science work with pollution data is legendary, and you are now weighing job offers in both Cincinnati, Ohio and Indianapolis, Indiana. You want to see if the SO2 levels are significantly different in the two cities, and more specifically, which city has lower levels. To test this, you decide to look at the differences in the cities' SO2 values (Indianapolis' -
Cincinnati's) over multiple years (provided as diffs_by_year
).
Instead of just displaying a p-value for a significant difference between the cities, you decide to look at the 95% confidence intervals (columns lower
and upper
) of the differences. This allows you to see the magnitude of the differences along with any trends over the years.
Diese Übung ist Teil des Kurses
Improving Your Data Visualizations in Python
Anleitung zur Übung
- Provide starting and ending limits (columns
lower
andupper
) for your confidence intervals toplt.hlines()
. - Set interval thickness to
5
. - Draw a vertical line representing a difference of
0
withplt.axvline()
. - Color the null line
'orangered'
to make it stand out.
Interaktive Übung
Versuche dich an dieser Übung, indem du diesen Beispielcode vervollständigst.
# Set start and ends according to intervals
# Make intervals thicker
plt.hlines(y = 'year', xmin = '____', xmax = '____',
linewidth = ____, color = 'steelblue', alpha = 0.7,
data = diffs_by_year)
# Point estimates
plt.plot('mean', 'year', 'k|', data = diffs_by_year)
# Add a 'null' reference line at 0 and color orangered
plt.axvline(x = ____, color = '____', linestyle = '--')
# Set descriptive axis labels and title
plt.xlabel('95% CI')
plt.title('Avg SO2 differences between Cincinnati and Indianapolis')
plt.show()