Lots of bootstraps with beeswarms
As a current resident of Cincinnati, you're curious to see how the average NO2 values compare to Des Moines, Indianapolis, and Houston: a few other cities you've lived in.
To look at this, you decide to use bootstrap estimation to look at the mean NO2 values for each city. Because the comparisons are of primary interest, you will use a swarm plot to compare the estimates.
The DataFrame pollution_may
is provided along with the bootstrap()
function seen in the slides for performing your bootstrap resampling.
This exercise is part of the course
Improving Your Data Visualizations in Python
Exercise instructions
- Run bootstrap resampling on each
city_NO2
vector. - Add city name as a column in the bootstrap DataFrame,
cur_boot
. - Color all swarm plot points
'coral'
to avoid the color-size problem.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Initialize a holder DataFrame for bootstrap results
city_boots = pd.DataFrame()
for city in ['Cincinnati', 'Des Moines', 'Indianapolis', 'Houston']:
# Filter to city
city_NO2 = pollution_may[pollution_may.city == city].NO2
# Bootstrap city data & put in DataFrame
cur_boot = pd.DataFrame({'NO2_avg': bootstrap(____, 100), 'city': ____})
# Append to other city's bootstraps
city_boots = pd.concat([city_boots,cur_boot])
# Beeswarm plot of averages with citys on y axis
sns.swarmplot(y = "city", x = "NO2_avg", data = city_boots, ____ = '____')
plt.show()