Get startedGet started for free

The central limit theorem

1. The central limit theorem

Now that we're familiar with the normal distribution, it's time to learn about what makes it so important.

2. Rolling a die five times

Let's go back to our dice rolling example. We can roll a six-sided fair die five times and record the results Now, we'll take the mean of the five rolls, which gives us two.

3. Rolling a die five times

If we roll another five times and take the mean, we get a different mean. And if we do it again, we get another mean.

4. 10 sets of five die rolls

We can repeat this 10 times: we'll roll a die five times, take the mean of that set of five rolls, and repeat 10 times. Here are the results, showing the mean from each set of five rolls.

5. Sampling distributions

We can plot these means too. A distribution of a summary statistic such as the mean is called a sampling distribution. This distribution, specifically, is a sampling distribution of the sample mean.

6. 100 sample means

We can repeat this process to take 100 sample means. If we look at the new sampling distribution, its shape somewhat resembles the normal distribution, even though the distribution of outcomes for each individual die roll is uniform.

7. 1000 sample means

Let's take 1000 means. This sampling distribution more closely resembles the normal distribution.

8. 10000 sample means

The shape stays consistent at ten thousand sample means,

9. 100000 sample means

one hundred thousand sample means,

10. One million sample means

and one million sample means!

11. Central limit theorem

This phenomenon is known as the central limit theorem, which states that a sampling distribution will approach a normal distribution as the size of the sample increases. In our example, the sampling distribution became closer to the normal distribution as we took more and more sample means. It's important to note that the central limit theorem only applies when samples are taken randomly and are independent, for example, randomly picking sales deals with replacement. Generally, a sample size of at least 30 is required for the central limit theorem to apply.

12. Standard deviation and the CLT

The central limit theorem, or CLT, applies to other summary statistics as well. If we take the standard deviation of five rolls 100000 times, the sample standard deviations are distributed normally, centered around 1.9, which is the distribution's standard deviation.

13. Proportions and the CLT

Another statistic that the central limit theorem applies to is proportion. Let's sample from a die five times with replacement and see how many times we roll a four. In this case, 20% of rolls were a four. If we sample again, 60% of rolls are a four.

14. Sampling distribution of proportion

If we repeat this 1000 times and plot the distribution of fours rolled in each sample, it resembles a normal distribution centered around 0.16, since there is a one in six chance of rolling a four.

15. Mean of the sampling distribution

Since these sampling distributions are normal, we can take their mean to get an estimate of a distribution's mean, standard deviation, or proportion. Here we can see our distribution of one thousand sample means, with a red dotted line showing the theoretical mean for a dice roll of 3.5. We've also included the sample mean as a green dotted line, which is 3.53. This is an example of the law of large numbers in action! With our die rolling examples we know what the underlying distributions look like for the mean, standard deviation, and proportions, but if we don't, this can be a useful method for estimating characteristics of an underlying distribution.

16. Benefits of the central limit theorem

The central limit theorem also comes in handy when we have a huge population and don't have the time or resources to collect data on everyone. Instead, we can collect smaller samples and create a sampling distribution to estimate summary statistics.

17. Let's practice!

Now, it's time to practice utilizing the central limit theorem.