Get startedGet started for free

Measures of spread

1. Measures of spread

Now let's discuss another category of summary statistics - measures of spread.

2. What is spread?

Spread describes how far apart data points are. Here is a histogram of vehicle crimes across London Boroughs. Comparing this to a histogram of burglaries, we can see the spread is much narrower.

3. Why is spread important?

Spread is important because it tells us how much variety may occur in our data. For example, if t-shirts typically cost 30 dollars, but can be anywhere from 10 dollars to 200 dollars, then how likely is it we will find one equal to 30 dollars? Does this change if t-shirt costs are between 20 and 50 dollars? So what measures of spread exist?

4. Range

The first is the range, which is the difference between the maximum and minimum values. For example, here are the London Boroughs with the largest and smallest number of burglaries. The range would be 5,183 minus 1,432, highlighting that Kingston upon Thames had 3,751 fewer burglaries than Tower Hamlets in the last two years!

5. Variance

Another measure is variance, which calculates the average distance from each data point to the mean. This plot shows the distribution of crimes per London Borough, with the mean displayed as a vertical line. We can see one borough is much further from the mean.

6. Variance

To calculate the variance, we first measure the distance from each data point to the mean value. For example, the total number of crimes in the London Borough of Westminster is 94923, and the mean number of crimes across all boroughs is 47672. To calculate the distance, we subtract the mean from the number of crimes in Westminster, 94923 minus 47672.

7. Variance

We repeat this by calculating the distances for each borough and adding them up. Unfortunately, if we add up all the distances, then negative values will cancel out positives, so in this case we end up with an overall distance of zero.

8. Variance

To avoid this, we square each distance and add them together. This gives a total of over seven billion.

9. Variance

As the data represents all crime in London, the variance is the sum of the squared distances divided by the number of boroughs, which is 32. It's important to note that the variance is in squared units. In this case, the count of total crime squared.

10. Standard deviation

It's difficult to understand the count of total crime squared. Therefore, we can convert it to the units of our data by taking the square root of the variance, a measure known as the standard deviation. This gives us a value of 15,319. Generally, the closer the standard deviation is to zero, the more closely clustered the data is around the mean.

11. Standard deviation in a histogram

Here we can see how dispersed the data is by showing a distance of one and two standard deviations away from the mean!

12. Quartiles

We can also measure spread using quartiles, which are a way of splitting the data into four equal parts. Here, we see the minimum value for various crimes in London and the four quartiles - 25%, 50%, 75%, and 100%. For each quartile, the value represents the percentage of values that are less than or equal to that number.

13. Quartiles

We can see that 75% of London boroughs had less than 4392 burglaries in the last two years. Note that the second quartile is the middle value, so it is equal to the median.

14. Box plots

We can visualize quartiles using a box plot. The left edge of the box is the first quartile, the middle line is the median, and the right edge of the box is the third quartile. Extreme values are shown beyond the horizontal lines, such as the dot above 4000.

15. Interquartile range (IQR)

Another measure of spread is the interquartile range, or IQR. It is the distance between the first and third quartiles. For robberies in London, the IQR is around 1080, meaning the middle 50% of boroughs still differ by 1080 robberies. The benefit of the IQR is that is less affected by extreme values than other measures of spread such as the standard deviation.

16. Let's practice!

Now let's practice measuring spread!