Basic confidence intervals
You are a data scientist for a fireworks manufacturer in Des Moines, Iowa. You need to make a case to the city that your company's large fireworks show has not caused any harm to the city's air. To do this, you look at the average levels for pollutants in the week after the fourth of July and how they compare to readings taken after your last show. By showing confidence intervals around the averages, you can make a case that the recent readings were well within the normal range.
This data is loaded as average_ests
with a row for each measured pollutant.
This exercise is part of the course
Improving Your Data Visualizations in Python
Exercise instructions
Create the lower and upper 95% interval boundaries:
- Create the lower boundary by subtracting 1.96 standard errors (
'std_err'
) from the'mean'
of estimates. - Create the upper boundary by adding 1.96 standard errors (
'std_err'
) to the'mean'
of estimates.
- Create the lower boundary by subtracting 1.96 standard errors (
Pass
pollutant
as the faceting variable tosns.FacetGrid()
and unlink the x-axes of the plots so intervals are all well-sized.Pass the constructed interval boundaries to the mapped
plt.hlines()
function.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Construct CI bounds for averages
average_ests['lower'] = average_ests['____'] - 1.96*average_ests['____']
average_ests['upper'] = average_ests['____'] + 1.96*average_ests['____']
# Setup a grid of plots, with non-shared x axes limits
g = sns.FacetGrid(average_ests, row = '____', ____ = False)
# Plot CI for average estimate
g.map(plt.hlines, 'y', '____', '____')
# Plot observed values for comparison and remove axes labels
g.map(plt.scatter, 'seen', 'y', color = 'orangered').set_ylabels('').set_xlabels('')
plt.show()