Get startedGet started for free

Lots of bootstraps with beeswarms

As a current resident of Cincinnati, you're curious to see how the average NO2 values compare to Des Moines, Indianapolis, and Houston: a few other cities you've lived in.

To look at this, you decide to use bootstrap estimation to look at the mean NO2 values for each city. Because the comparisons are of primary interest, you will use a swarm plot to compare the estimates.

The DataFrame pollution_may is provided along with the bootstrap() function seen in the slides for performing your bootstrap resampling.

This exercise is part of the course

Improving Your Data Visualizations in Python

View Course

Exercise instructions

  • Run bootstrap resampling on each city_NO2 vector.
  • Add city name as a column in the bootstrap DataFrame, cur_boot.
  • Color all swarm plot points 'coral' to avoid the color-size problem.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Initialize a holder DataFrame for bootstrap results
city_boots = pd.DataFrame()

for city in ['Cincinnati', 'Des Moines', 'Indianapolis', 'Houston']:
    # Filter to city
    city_NO2 = pollution_may[pollution_may.city  ==  city].NO2
    # Bootstrap city data & put in DataFrame
    cur_boot = pd.DataFrame({'NO2_avg': bootstrap(____, 100), 'city': ____})
    # Append to other city's bootstraps
    city_boots = pd.concat([city_boots,cur_boot])

# Beeswarm plot of averages with citys on y axis
sns.swarmplot(y = "city", x = "NO2_avg", data = city_boots, ____ = '____')

plt.show()
Edit and Run Code