Get startedGet started for free

Plot all of your data: Bee swarm plots

1. Plot all of your data: Bee swarm plots

The histogram of county-level election data was informative.

2. 2008 US swing state election results

We learned that more counties voted for McCain than for Obama. Since our goal is to learn from data, this is great! However, a major drawback of using histograms

3. 2008 US swing state election results

is that the same data set can look different depending on how the bins are chosen. And choice of bins is in many ways arbitrary. This leads to

4. Binning bias

binning bias; you might interpret your plot differently for two different choices of bin number. An additional problem with histograms is that we are not plotting all of the data. We are sweeping the data into bins, and losing their actual values.

5. Bee swarm plot

To remedy these problems we can make a bee swarm plot, also called a swarm plot. This is best shown by example. Here is a beeswarm plot of the vote totals in the three swing states. Each point in the plot represents the share of the vote Obama got in a single county. The position along the y-axis is the quantitative information. The data are spread in x to make them visible, but their precise location along the x-axis is unimportant. Notably, we no longer have any binning bias and all data are displayed. This plot may be conveniently generated using Seaborn.

6. Organization of the data frame

A requirement is that your data are in a well-organized Pandas DataFrame

7. Organization of the data frame

where each column is a feature and

8. Organization of the data frame

each row an observation. In this case, an observation is a county, and the features are state and the Democratic share of the vote.

9. Generating a bee swarm plot

To make the plot, you need to specify which column gives the values for the y-axis, in this case the share of the vote that went to the Democrat Barack Obama, and the values for the x-axis, in this case the state. And of course, you need to tell it which DataFrame contains the data.

10. 2008 US swing state election results

From this plot, too, we can clearly see that Obama got less than 50% of the vote in the majority of counties in each of the three swing states. This time it is more detailed than a histogram, but without too much added visual complexity.

11. Let's practice!

Now it's your turn to make some bee swarm plots!

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.