Confidence intervals and sampling

1. Confidence intervals and sampling

While point estimates are one tool for inference, a single number will never be a perfect estimate of a population statistic. In this lesson, we'll introduce confidence intervals, a different way to make inferences about our population.

2. What is a confidence interval?

A confidence interval takes a sample and uses it to generate a range of values within which we have high confidence that the population statistic we are estimating lies. Suppose a sample of one hundred employees at a company gave an average salary of eighty thousand dollars and a standard deviation of ten thousand dollars. We could use this information to generate a confidence interval for the average salary of all employees at the company. We'll get into the details soon, but the confidence interval shown has been generated by SciPy, which we will cover now.

3. Calculating a confidence interval

Calculation of a confidence interval using SciPy involves knowing the mean, sample size and standard deviation. The confidence interval is centered at the mean, and so is given using the "loc" parameter. The "scale" is the standard error, which is the standard deviation divided by the square root of the sample size. Finally, we need to specify what confidence level to use, by using the "alpha" parameter. It's important to check if our sampling distribution is approximately normal before making our confidence interval. The reliability of our confidence interval requires our sampling distribution to be approximately normal! If it's not, our inference may be invalid. The reason for this is what is called the Central Limit Theorem.

4. Central Limit Theorem

All of this statistical machinery is using the central limit theorem in the background. In short, the central limit theorem tells us that if we take the average of many independent samples, the resulting sampling distribution will be approximately normal. We can demonstrate that by repeatedly taking five random numbers chosen from zero to nine and taking their mean. We can then look at the resulting sampling distribution.

5. Central Limit Theorem - Plot

Once we graph the sampling distribution we see that it is approximately normal. Now, if we wanted just the middle ninety five percent of this data, we would see that ninety five percent of the means will lie between two-point-six and seven-point-two. This is precisely the math that SciPy will be doing for us when creating a 95-percent confidence interval.

6. Samples and confidence intervals

Finally, our confidence interval comes from the mean and standard error of our sample. Therefore, different samples will yield different confidence intervals, and thus potentially different conclusions. In many cases, populations are incredibly diverse. Imagine a city with wealthy and poor people, immigrants and native citizens, different races, ethnicities, religions and political views. Taking a sample that represents this broad diversity of people is incredibly difficult. But failure to do so would result in sampling bias, and would lead to a confidence interval that makes inference which is not applicable to the entire population.

7. What a confidence interval tells us

While we can learn a lot from the confidence interval, there are some things it does not tell us. We cannot say something like "there is a 95 percent chance the population statistic is in the confidence interval." Either it is or is not in the confidence interval, we can't say more than that. Instead, we know that if we took many samples and constructed a confidence interval each time, then about 95 percent of the time our population statistic would lie in the confidence interval.

8. Let's practice!

Now that we see how confidence intervals fit in with sampling, let's practice!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.