1. The central limit theorem
Now that we're familiar with the normal distribution, it's time to learn about what makes it so important.
2. Rolling a die five times
Let's go back to our dice rolling example.
We can roll a six-sided fair die five times and record the results
Now, we'll take the mean of the five rolls, which gives us two.
3. Rolling a die five times
If we roll another five times and take the mean, we get a different mean.
And if we do it again, we get another mean.
4. 10 sets of five die rolls
We can repeat this 10 times: we'll roll a die five times, take the mean of that set of five rolls, and repeat 10 times.
Here are the results, showing the mean from each set of five rolls.
5. Sampling distributions
We can plot these means too.
A distribution of a summary statistic such as the mean is called a sampling distribution. This distribution, specifically, is a sampling distribution of the sample mean.
6. 100 sample means
We can repeat this process to take 100 sample means.
If we look at the new sampling distribution, its shape somewhat resembles the normal distribution, even though the distribution of outcomes for each individual die roll is uniform.
7. 1000 sample means
Let's take 1000 means.
This sampling distribution more closely resembles the normal distribution.
8. 10000 sample means
The shape stays consistent at ten thousand sample means,
9. 100000 sample means
one hundred thousand sample means,
10. One million sample means
and one million sample means!
11. Central limit theorem
This phenomenon is known as the central limit theorem, which states that a sampling distribution will approach a normal distribution as the size of the sample increases.
In our example, the sampling distribution became closer to the normal distribution as we took more and more sample means.
It's important to note that the central limit theorem only applies when samples are taken randomly and are independent, for example, randomly picking sales deals with replacement.
Generally, a sample size of at least 30 is required for the central limit theorem to apply.
12. Standard deviation and the CLT
The central limit theorem, or CLT, applies to other summary statistics as well.
If we take the standard deviation of five rolls 100000 times, the sample standard deviations are distributed normally, centered around 1.9, which is the distribution's standard deviation.
13. Proportions and the CLT
Another statistic that the central limit theorem applies to is proportion.
Let's sample from a die five times with replacement and see how many times we roll a four. In this case, 20% of rolls were a four. If we sample again, 60% of rolls are a four.
14. Sampling distribution of proportion
If we repeat this 1000 times and plot the distribution of fours rolled in each sample, it resembles a normal distribution centered around 0.16, since there is a one in six chance of rolling a four.
15. Mean of the sampling distribution
Since these sampling distributions are normal, we can take their mean to get an estimate of a distribution's mean, standard deviation, or proportion.
Here we can see our distribution of one thousand sample means, with a red dotted line showing the theoretical mean for a dice roll of 3.5. We've also included the sample mean as a green dotted line, which is 3.53. This is an example of the law of large numbers in action!
With our die rolling examples we know what the underlying distributions look like for the mean, standard deviation, and proportions, but if we don't, this can be a useful method for estimating characteristics of an underlying distribution.
16. Benefits of the central limit theorem
The central limit theorem also comes in handy when we have a huge population and don't have the time or resources to collect data on everyone.
Instead, we can collect smaller samples and create a sampling distribution to estimate summary statistics.
17. Let's practice!
Now, it's time to practice utilizing the central limit theorem.