Get startedGet started for free

Variance and standard deviation

1. Variance and standard deviation

Once again, let's look at the 2008

2. 2008 US swing state election results

swing state data on the county level and think about other summary statistics we can calculate. In this bee swarm plot, I also show the means of each state with a horizontal line. In looking at this plot, the mean seems to capture the magnitude of the data, but what about the variability, or the spread, of the data? Florida seems to have more county-to-county variability than Pennsylvania or Ohio.

3. Variance

We can quantify this spread with the variance. The variance is the average of the squared distance from the mean. That definition was a mouthful. Let's parse that more carefully with a graphical example, looking specifically at Florida.

4. 2008 Florida election results

For each data point,

5. 2008 Florida election results

we square the distance from the mean, and then take the average of all of these values.

6. Computing the variance

Calculation of the variance is implemented in the np dot var function. Now, because the calculation of the variance involves squared quantities, it does not have the same units of what we have measured, in this case the vote share for Obama. Therefore, we are interested in

7. Computing the standard deviation

the square root of the variance, which is called the standard deviation. This is calculated with the np dot std function, and the results are the same as taking the square root of the variance.

8. 2008 Florida election results

When we look at the swarm plot, it is clear that the standard deviation is a reasonable metric for the typical spread of the data.

9. Let's practice!

Ok, let's practice computing some variances and standard deviations!