1. Interpreting a Confidence Interval
In the last video we came to the conclusion that we are 95% confident
2. Confidence intervals
that we were 95% confident that the true proportion of Americans that are happy is between point-seven-zero-five and point-eight-four-one. But what exactly do we mean by confident. Let's look deeper into this by starting with the confidence interval that we've already formed.
3. Dataset 1
The data from which this interval was constructed is from 2016, and we can plot both p-hat and the resulting interval on a number line here. To understand what is meant by confident, we need to consider how this interval fits into the big picture
4. p-values
In classical statistical inference, there is thought to be a fixed but unknown parameter of interest, in this case the population proportion of Americans that are happy.
5. p-values
The 2016, the survey drew a small sample of this population
6. p-values
calculated p-hat to estimate the parameter, p,
7. p-values
and quantified the uncertainty in that estimate with a confidence interval.
8. p-values
Now imagine what would happen if we were to draw a new sample
9. p-values
of the same size from that population and come up with a new p-hat and a new interval. It wouldn't be the same as our first, but it'd likely be similar.
10. p-values
We can imagine doing this a third time: a new data sample,
11. p-values
a new p-hat and a new interval.
12. p-values
We can keep this thought experiment going
13. p-values
but what we want to focus on is the properties of this collection of confidence intervals that are accumulating.
14. Dataset 2
While we can't go out right now and knock on doors to collect a new sample of data, we do have data from previous years that we can treat as separate samples. Let's look at the data from 2014 and call it ds2. In that sample, the proportion that are happy is about point-8-9.
When we compute a 95% confidence interval, we see it stretches from about point-8-3 to point-9-4.
15. Dataset 3
We can do this a third time by looking back at the data from 2012, which we'll call ds3.
In this sample,
16. Dataset 3
p-hat is point-8-3 and our interval spans from point-7-6 to point-8-9.
17. Dataset 3
If we were to continue this process many times, we'd get many different p-hats
18. Dataset 3
and many different intervals. These intervals aren't arbitrary: they're designed to capture that unknown population parameter p.
19. Dataset 3
You can see in this plot that almost all of our intervals succeeded in capturing p, but not all of them.
20. Dataset 3
This interval missed the mark. If these are 95% confident intervals, they will have the property that if we form a very large collection of intervals, we'd expect that 95% of them would capture the parameter and 5% of them would not.
21. Confidence Intervals
This is what is meant by 95% confident. It's a statement about the way that these intervals behave across many samples of data. Another property of intervals that is important to consider is their width, which is effected by three factors: the sample size, n, the confidence level, and the value of the parameter, p.
22. Let's practice!
In the following exercises you'll get the chance to explore these factors and how they effect confidence intervals.