Get startedGet started for free

90, 95, and 99% intervals

You are a data scientist for an outdoor adventure company in Fairbanks, Alaska. Recently, customers have been having issues with SO2 pollution, leading to costly cancellations. The company has sensors for CO, NO2, and O3 but not SO2 levels.

You've built a model that predicts SO2 values based on the values of pollutants with sensors (loaded as pollution_model, a statsmodels object). You want to investigate which pollutant's value has the largest effect on your model's SO2 prediction. This will help you know which pollutant's values to pay most attention to when planning outdoor tours. To maximize the amount of information in your report, show multiple levels of uncertainty for the model estimates.

This exercise is part of the course

Improving Your Data Visualizations in Python

View Course

Exercise instructions

  • Fill in the appropriate interval width percents (from 90,95, and 99%) according to the values list in alpha.
  • In the for loop, color the interval by its assigned color.
  • Pass the loop's width percentage value to plt.hlines() to label the legend.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Add interval percent widths
alphas = [     0.01,  0.05,   0.1] 
widths = [ '__% CI', '__%', '__%']
colors = ['#fee08b','#fc8d59','#d53e4f']

for alpha, color, width in zip(alphas, colors, widths):
    # Grab confidence interval
    conf_ints = pollution_model.conf_int(alpha)
    
    # Pass current interval color and legend label to plot
    plt.hlines(y = conf_ints.index, xmin = conf_ints[0], xmax = conf_ints[1],
               colors = ____, ____ = width, linewidth = 10) 

# Draw point estimates
plt.plot(pollution_model.params, pollution_model.params.index, 'wo', label = 'Point Estimate')

plt.legend()
plt.show() 
Edit and Run Code