Get startedGet started for free

Point estimate intervals

1. Point estimate intervals

In this chapter, we're switching gears to another important area of data visualization: the visualization of uncertainty.

2. What is "uncertainty"? (a)

What exactly do we mean by "uncertainty"? Uncertainty in statistics is a formalized method of representing how unsure any estimate of some value is. Say you are a farmer with a flock of sheep, and you want to estimate the ratio of black to white sheep in your flock. Your flock is huge, so you can only feasibly count a few sheep or a sample of your population.

3. What is "uncertainty"? (b)

Once you've taken this sample. You count 25 sheep and find 10 black and 15 white, or a ratio of 2:3. This is an estimate based on your sample of the population.

4. What is "uncertainty"? (c)

You realize that you may have coincidentally gathered almost all the black sheep at the same time, meaning your true ratio is much smaller than 2:3. This means your estimate of 2:3 could be closer to 2:30 or 20:3. Using statistics, you can formalize this uncertainty caused by sampling into ranges of values for your estimate.

5. When is uncertainty important?

When do you need to worry about uncertainty? Any time a number your are presenting is an estimate based on a sample of some whole. Like in our sheep example, the number of total sheep in the herd is too high to look at every one, so you took a sample. By using a sample, your estimate has uncertainty. An example of when you don't have uncertainty would be a scenario where you have looked at the entire 'population.' For instance, you found the time to count every sheep in your flock. Another situation is when you're stating an observation such as the number of sheep in a given pen. In these scenarios you are reporting a fact and uncertainty is not needed.

6. Why is uncertainty important?

Whenever an estimate is presented, especially one used to make a decision, you naturally will want to know how confident or precise that estimate is. If you were going to jump out of a plane with a parachute and you were told the odds of the parachute failing were 1 in 10,000 you'd feel okay, but if you were told that the confidence on that odds is between 1 in 20,000 and 1 in 10, you would probably re-evaluate your decision. While most estimates you're going to visualize don't have quite the same gravity, it's critical to show the uncertainty of any estimate that has it to acknowledge the limitations of your data.

7. The confidence interval (a)

The most common type of uncertainty you'll encounter is the confidence interval. While the mathematical description of a confidence interval is beyond the scope of this course what matters is its interpretation. Let's return to our sheep to define a confidence interval. Say you have a lot of free time, so you decide to go out into your field and grab 50 random sheep and count the proportion of white to black. After you've done this, you repeat it again, over and over. Taking infinite samples of your herd.

8. The confidence interval (b)

Once you've finished this, you look at all the proportion estimates you gathered and find that 95% of the time they fall between 1 to 6 and 5 to 6 black to white sheep. This is a 95% confidence interval. To put it more concisely. A confidence interval of 95% represents a range of values that your estimate, such as an average, will fall in 95% of the time if you were to sample the population an infinite amount of times.

9. Using hlines() to show intervals

To plot confidence intervals, we will use the hlines() function in matplotlib. Here we make a 95% confidence interval for three estimates. We pass the lower and upper interval estimates to the xmin and xmax arguments of hlines() along with the y-value of the estimate we are plotting. In addition, we draw a small line at the original or point-estimate for reference.

10. Let's show some uncertainty!

Now that you've had a whirlwind tour of confidence intervals let's put this to work visualizing some uncertainty in our pollution dataset.