LoslegenKostenlos loslegen

Annotating confidence intervals

Your data science work with pollution data is legendary, and you are now weighing job offers in both Cincinnati, Ohio and Indianapolis, Indiana. You want to see if the SO2 levels are significantly different in the two cities, and more specifically, which city has lower levels. To test this, you decide to look at the differences in the cities' SO2 values (Indianapolis' - Cincinnati's) over multiple years (provided as diffs_by_year).

Instead of just displaying a p-value for a significant difference between the cities, you decide to look at the 95% confidence intervals (columns lower and upper) of the differences. This allows you to see the magnitude of the differences along with any trends over the years.

Diese Übung ist Teil des Kurses

Improving Your Data Visualizations in Python

Kurs anzeigen

Anleitung zur Übung

  • Provide starting and ending limits (columns lower and upper) for your confidence intervals to plt.hlines().
  • Set interval thickness to 5.
  • Draw a vertical line representing a difference of 0 with plt.axvline().
  • Color the null line 'orangered' to make it stand out.

Interaktive Übung

Versuche dich an dieser Übung, indem du diesen Beispielcode vervollständigst.

# Set start and ends according to intervals 
# Make intervals thicker
plt.hlines(y = 'year', xmin = '____', xmax = '____', 
           linewidth = ____, color = 'steelblue', alpha = 0.7,
           data = diffs_by_year)
# Point estimates
plt.plot('mean', 'year', 'k|', data = diffs_by_year)

# Add a 'null' reference line at 0 and color orangered
plt.axvline(x = ____, color = '____', linestyle = '--')

# Set descriptive axis labels and title
plt.xlabel('95% CI')
plt.title('Avg SO2 differences between Cincinnati and Indianapolis')
plt.show()
Code bearbeiten und ausführen