Get startedGet started for free

The normal distribution

1. The normal distribution

I mentioned earlier that the famous Black-Scholes geometric Brownian motion model for asset prices. This model implies that log-returns over different time periods are normally distributed and independent.

2. Definition of normal

The normal distribution is a simple, tractable model, and it would certainly be convenient if log-returns were normal. But is this really the case for the typical risk factors that interest us? In this chapter, we are going to investigate. In this video, I'll define the normal distribution, mention a few of its properties and then discuss estimating normal distributions and testing for normal distributions. A variable X is said to have a normal distribution if its probability density takes a well-known simple form shown on the slide and known informally as the bell curve. It depends on two parameters only, mu and sigma.

3. Properties of the normal

Here are some of the appealing properties of the normal distribution: mu is the mean and sigma squared is the variance. The usual notation for the distribution is N(mu, sigma^2) The two parameters are easily estimated from data, as we will shortly discuss. If you add two or more independent and normally distributed variables together, the sum is also normally distributed. Moreover, suppose you take the sum of independent and identically distributed variables with any distribution, so long as it has a finite variance. As you add together more and more such variables, the distribution of the sum gets closer and closer to a normal distribution. This is the celebrated central limit theorem.

4. Central limit theorem (CLT)

On the slide, you can see the example of a Gamma distribution with shape parameter 2. As you add together 5, 100, and then finally 1000 gamma variables, the distribution of the sum gets closer to normal. The central limit theorem is the main reason why the normal distribution is the best-known distribution in statistics, and you are going to see an application of it in this chapter. So how do you estimate a normal distribution?

5. How to estimate a normal distribution

Suppose you have some data X1 to Xn, for example, some log-returns. You want to find values for the parameters mu and sigma that give a good fit to the data. One way of estimating the normal distribution is to use the method-of-moments. This is a rather grand name for a simple idea: mu is estimated by taking the sample mean, that is, the average of the data. sigma squared is estimated by taking the sample variance. In the slide, you can see formulas for the estimators, which are usually written as mu.hat and sigma.hat squared. A subscript u has been added to the latter to show that this is the unbiased estimator of sigma squared, which is the version used by R. The expected value of an unbiased estimator is identically equal to the true parameter value. There is another version in which you divide by n rather than n-1, and this is the estimator that is obtained by applying the well-known maximum-likelihood method. In the rest of this course, whenever you see sigma.hat without a u subscript, this refers to the maximum likelihood estimator rather than the unbiased moment estimator. Note that both estimators give very similar results in reasonably sized samples. Let's look at an example

6. FTSE example

Note that in this analysis, ftse contains the sorted numerical values of the FTSE log-returns. Thus head gives the largest negative log-returns, and tail gives the largest returns. The functions mean() and sd() are used to compute the sample mean and the sample standard deviation. The former is close to zero, and the latter is around point zero one nine.

7. Displaying the fitted normal

You can then superimpose the fitted normal curve on the histogram of the data. The histogram is created with the hist() function. nclass has been set to give 20 buckets, and the option probability equals TRUE ensures that the total area of the histogram is one. In other words, it represents a probability density. Then the normal density is added on top using the lines() function with the color red for a red curve. The function dnorm() calculates the normal density, but you have to pass the estimated values for the mean and standard deviation that were obtained earlier. As you can see, the normal density is not the greatest fit to the histogram; it doesn't have the same high peak in the middle or the same amount of weight out in the tails.

8. Let's practice!

Now in the exercise, you'll try something similar with Dow Jones index returns.