Get startedGet started for free

The Normal distribution: Properties and warnings

1. The Normal distribution: Properties and warnings

Here is the

2. Image: Deutsche Bundesbank

10 Deutschmark bill. It was retired in 2002 with the adoption of the Euro by Germany. The man pictured on this bill is Karl Gauss, one of the greatest mathematicians of all time. Clearly, the Germans think highly of Gauss to put him on their currency.

3. The Gaussian distribution

If we zoom in to the center of the bill, we see which of Gauss's barrel full of accomplishments they think most highly of. There it is, the Normal distribution! It is often also called the Gaussian distribution, after its inventor, so you will often hear it referred to this way. So, yes, the Normal distribution is very important and very widely used. It is so prevalent, it is worth talking about some more. In practice, it is used to describe most symmetric peaked data you will encounter. Furthermore, for many of the statistical procedures you have heard of, Normality assumptions about the data are present. Indeed, it is a very powerful distribution that seems to be ubiquitous in nature, not just in the field of statistics. That said, there are important caveats about the distribution and we need to be careful when using it. First off, often times things you may think are Normally distributed are not.

4. Length of MA largemouth bass

Consider for example largemouth bass in Massachusetts lakes measured in 1994 and 1995 by the Massachusetts Department of Environmental Protection. If we look at a histogram of the length of the 316 fish they measured, they appear to be Normally distributed. Indeed, when we look at

5. Length of MA largemouth bass

the ECDF overlayed with a theoretical Normal CDF, the measurements look close to Normally distributed. There are some systematic differences, though,

6. Length of MA largemouth bass

especially on the left tail. So this is not quite a Normal distribution, but we might not be making too big of an error by treating it as so.

7. Mass of MA largemouth bass

Now, let's consider the mass of the bass. One might think that since the length of these bass is close to Normally distributed, the mass should be also. When we overlay the theoretical Normal CDF on the ECDF of the data, it is not even close. Seeing this immediately shows us that our initial thought was incorrect. Another important issue to keep in mind when using the Normal distribution is

8. Light tails of the Normal distribution

the lightness of its tails. If we look at the Normal distribution, the probability of being

9. Light tails of the Normal distribution

more than four standard deviations from the mean is very small. This means that when you are modeling data as Normally distributed, outliers are extremely unlikely. Real data sets often have extreme values, and when this happens, the Normal distribution might not be the best description of your data.

10. Let's practice!

These caveats are not meant to scare you. I bring them up to remind you to always think carefully about assumptions that go into your analyses. The Normal distribution is still of great use, and use it you will in the next exercises!