90, 95, and 99% intervals
You are a data scientist for an outdoor adventure company in Fairbanks, Alaska. Recently, customers have been having issues with SO2 pollution, leading to costly cancellations. The company has sensors for CO, NO2, and O3 but not SO2 levels.
You've built a model that predicts SO2 values based on the values of pollutants with sensors (loaded as pollution_model
, a statsmodels
object). You want to investigate which pollutant's value has the largest effect on your model's SO2 prediction. This will help you know which pollutant's values to pay most attention to when planning outdoor tours. To maximize the amount of information in your report, show multiple levels of uncertainty for the model estimates.
This exercise is part of the course
Improving Your Data Visualizations in Python
Exercise instructions
- Fill in the appropriate interval width percents (from 90,95, and 99%) according to the values list in
alpha
. - In the for loop, color the interval by its assigned
color
. - Pass the loop's
width
percentage value toplt.hlines()
to label the legend.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Add interval percent widths
alphas = [ 0.01, 0.05, 0.1]
widths = [ '__% CI', '__%', '__%']
colors = ['#fee08b','#fc8d59','#d53e4f']
for alpha, color, width in zip(alphas, colors, widths):
# Grab confidence interval
conf_ints = pollution_model.conf_int(alpha)
# Pass current interval color and legend label to plot
plt.hlines(y = conf_ints.index, xmin = conf_ints[0], xmax = conf_ints[1],
colors = ____, ____ = width, linewidth = 10)
# Draw point estimates
plt.plot(pollution_model.params, pollution_model.params.index, 'wo', label = 'Point Estimate')
plt.legend()
plt.show()