Get startedGet started for free

Multivariate normal distribution

1. Multivariate normal distribution

The multivariate normal is one of the most important probability distributions. It is a generalization of the univariate normal and is specified by a mean vector and variance-covariance matrix. Even if all the variables individually follow a univariate normal, the joint distribution might not be a multivariate normal. Let's start with univariate normals.

2. Univariate normal distribution

A univariate normal is a symmetric, one-dimensional, bell-shaped distribution. It is defined by a scalar mean and variance parameter and is widely used to model many natural phenomena.

3. Density shape of a bivariate normal

In contrast, the bivariate normal is a function of two variables, x and y, defined by a mean vector of length 2, and a 2 by 2 variance-covariance matrix. Notice that the overall density is shaped like a bell and that, from the x and y axis, it looks similar to the univariate normal.

4. Bivariate normal density - 3D density plot

The mean of a bivariate normal is denoted by mu and the variance-covariance matrix is denoted by sigma. Here we plot the density, specified by the mean 1 and 2, and variance-covariance matrix sigma, with diagonal entries 1 and 2 specifying the variance, and a covariance of 0 point 5. The color represents the height of the density at any given location.

5. Bivariate normal density - contour plot

You can also use a contour plot to visualize the same bivariate density. The contours join points on the surface that have the same height. For any arbitrary normal, the contours are elliptical in shape with the center at the mean. The width and orientation of the ellipses are determined by the variance-covariance matrix.

6. Bivariate normal density with a different mean

Changing the mean will move the center of ellipses to a new location, without altering the shape.

7. Bivariate normal density with a different variance

Changing the variance-covariance matrix changes the shape of the ellipses. In the special case where the off-diagonal entry, or covariance, equals zero and the variances are equal, the ellipses become circles.

8. Bivariate normal density with strong correlation

In contrast, a very high positive correlation will make the contour ellipses very narrow and aligned with the 45-degree line.

9. Functions for statistical distributions in R

Most statistical distributions in R have four functions. For univariate normal they are rnorm(), dnorm(), qnorm(), pnorm(), whereas for multivariate normal the mvtnorm library contains the functions rmvnorm(), dmvnorm(), qmvnorm(), and pmvnorm().

10. Functions for statistical distributions in R

Notice that the first letters are common. P stands for probability, q for quantile, d for density calculation, and r for generating random samples. These letters are followed by the abbreviations for the distribution, like norm for normal and mvnorm for multivariate normal.

11. The rmvnorm function

For example, to generate random samples from multivariate normal distribution we use the function rmvnorm(). We need to specify n, or the number of samples, the mean, and sigma, which is the variance-covariance matrix.

12. Using rmvnorm to generate random samples

To generate 1000 samples from the specified mean and sigma we first create the mean vector, mu1, and the variance-covariance matrix, sigma1. If we want to reproduce the samples, we should use the set dot seed() function. Then we call the rmvnorm function, with n equals 1000, mean equals mu1, and sigma equals sigma1.

13. Simulation results

The ggpairs() plot reflects the univariate normality of each of the dimensions and the bivariate normality of each pair. Notice the circular shapes of the pairs plot involving variable X3, which is uncorrelated with X1 and X2, compared to the ellipsoidal shape of the pairs plot of X1 and X2. The ellipsoidal shape of this plot reflects the high positive correlation between the variables. The narrower the ellipse, the stronger the correlation between the two variables. The correlation of 0 point 707 between X1 and X2 corresponds to the moderately narrow ellipse.

14. Let's practice simulating from a multivariate normal distribution!

Now let's practice simulating from a multivariate normal distribution.