Get startedGet started for free

The normal distribution

1. The normal distribution

The next probability distribution we'll discuss is the normal distribution, which is a continuous probability distribution. It's one of the most important probability distributions we'll learn about since several statistical methods rely on it, and it applies to more real-world situations than the distributions we've covered so far.

2. What is the normal distribution?

The normal distribution looks like this. Its shape is commonly referred to as a bell curve. This distribution has a few important properties.

3. Symmetrical

First, it's symmetrical, so the left side is a mirror image of the right.

4. Area = 1

Second, just like any probability distribution, the area beneath the curve equals one.

5. Curve never hits 0

Third, the probability never hits zero, even if it looks like it at the tail ends. For example, the chances of getting a value above 10 in this distribution is less than 0.5 percent, but it is possible!

6. Described by mean and standard deviation

The normal distribution is described by its mean and standard deviation. Here is a normal distribution with a mean of 20 and standard deviation of three, and here is a normal distribution with a mean of zero and a standard deviation of one. Notice how both distributions have the same shape, but their axes have different scales.

7. Areas under the normal distribution

For the normal distribution, 68% of the area is within one standard deviation of the mean.

8. Areas under the normal distribution

95% of the area falls within two standard deviations of the mean,

9. Areas under the normal distribution

and 99.7% of the area falls within three standard deviations. This is sometimes called the 68-95-99.7 rule.

10. Why is the normal distribution important?

So why is the normal distribution important? Firstly, lots of real-world data closely resembles the normal distribution. For example, here is a histogram showing the percentage of schools in each region of the United Kingdom achieving pass grades for end of secondary school exams. Drawing a line over the shape of the histogram closely resembles the normal distribution. Secondly, in hypothesis testing our data must follow a normal distribution in order to perform many statistical tests, such as comparing the mean of a sample to the population it represents.

11. Skewness

When interpreting the distribution of data we often use the term skewness, which describes the direction that the data tails off. For example, the plot on the left peaks on the left and tails off to the right, so we describe the distribution as positive skewed, or right skewed, as the tail is on the right where larger positive values are. Conversely, a negative skewed, or left skewed distribution peaks on the right and tails off to the left. It is very common to observe skewness in real-world data such as household income, which is typically positive skewed due to some households earning much more than the typical income.

12. Kurtosis

We can also interpret a distribution by its kurtosis, which is a way of describing the occurrence of extreme values in a distribution. There are three types of kurtosis.

13. Kurtosis

The first is positive kurtosis, also known as leptokurtic, which is characterized by a large peak around the mean and smaller standard deviation, shown here in red. A mesokurtic distribution is the term used to describe the normal distribution, which is shown as the blue curve in the plot. Lastly, negative kurtosis, also known as platykurtic, describes a distribution with a lower peak and larger standard deviation, as highlighted in green here.

14. Let's practice!

Time to check our understanding of how to interpret the shape of a distribution!