Normal distributions

1. Normal distributions

It's time to study the most important and widely used type of distribution in probability and statistics: the normal distribution.

2. Modeling for measures

Normal distributions, also called Gaussian distributions after mathematician Johann Carl Friedrich Gauss, allow you to model many situations. For instance, you can use them to model different measures like clothing sizes, speed measures or product weights.

3. Adults' heights example

Let's look at an example. The heights of adults aged between 18 and 35 years are normally distributed. The mean height of adult males in this age group is 70 inches. Adult females have a mean height of 65 inches. You can see the probability density in the plot. Let's take a look at how this works.

4. Probability density

The probability density is a function that assigns the relative likelihood to each possible outcome in the sample space. In normal distributions, the probability density has a bell shape. The plot is dense and symmetric around the mean.

5. Probability density examples

For instance, in the plot on the left you can see that the probability density of getting -1 is roughly 0.24. On the other hand, the probability density of getting 0 is 0.4, as seen in the plot on the right. You can compare the probability densities for different values of the random variable to determine which is more likely. In this case, 0 is more likely than -1. But what would the probability of getting a value between -1 and 0 be?

6. Probability density and probability

To calculate such a probability, we need to calculate the area under the probability density curve between -1 and 0. We do this by subtracting the cdf for -1 from cdf for 0. The probability is 0.34.

7. Symmetry

Normal distributions are symmetric around the mean. That means that the probability of getting a value below the mean is the same as the probability of getting a value above the mean: 0.5.

8. Mean

One important consequence of the symmetry of the probability density function is that the mean is the value with the highest probability density. In this plot you can see that the mean of the probability density indicated by the green dotted line is 0.

9. Mean (Cont.)

The probability density indicated by the blue dashed line has a mean of 1. You can see how the curve is moved to the right.

10. Mean (Cont.)

The probability density indicated by the red solid line has a mean of -2; it's moved to the left.

11. Standard deviation

The standard deviation is a measure of how spread out the probability density is. For different standard deviations the curve concentrates more or less probability density around the mean. In this plot you can see a red solid line representing a probability density with mean 0 and standard deviation 0.64.

12. Standard deviation (Cont.)

And here you can see a green dotted line with a standard deviation of 1.

13. Standard deviation (Cont.)

Finally, the blue dashed line shows a standard deviation of 2. The lower the value of the standard deviation, the more concentrated the probability density is around the mean. Let's take a look at some other important properties.

14. One standard deviation

From a statistical point of view, it's interesting to know how far the data is from the mean in terms of standard deviations. In any normal distribution, 0.68 probability is concentrated one standard deviation around the mean.

15. Two standard deviations

0.95 probability is concentrated two standard deviations around the mean.

16. Three standard deviations

0.997 probability is within three standard deviations.

17. Normal sampling

What if you want to generate a sample from a normal distribution? First, you import norm from scipy dot stats, matplotlib dot pyplot as plt, and seaborn as sns. Then you use norm dot rvs, with the loc parameter as the mean and the scale parameter as the standard deviation, and specify the size of the sample. Use random_state to reproduce the results. Finally, you use sns dot distplot to plot the sample.

18. Normal sampling (Cont.)

We got this beautiful probability density plot. You can plot any probability density just knowing the mean and standard deviation.

19. Let's do some exercises with normal distributions

We already know the fundamentals about normal distributions. Now let's see what we can do with them.