Expected value and variance

1. Expected value and variance

When we talk about a probability distribution, we're often interested in summarizing it into a few descriptive statistics.

2. Properties of a distribution

Two of the most interesting properties are where the distribution is centered, and how widely spread out it is. We describe these with the expected value and the variance.

3. Expected value

The expected value is the mean of the distribution. If you imagine we drew an infinite number of values from the distribution, the expected value is what the average of all those would be. This puts it right at the center of a distribution. Let's try to find the expected value of the binomial distribution with size 10, and probability point-5. We can't draw an infinite number of values, but we can draw a lot of them. As you've done in the exercises, we can use rbinom to simulate one hundred thousand draws with size 10 and probability point-5, then use the mean() function to take the average of these draws. We see the average is very close to 5. That's the "center" of the distribution if we displayed it as a histogram. If we tried sampling from a binomial with size 100 and probability point-2, we find that the mean is very close to 20. As you might notice from these examples, there's a general rule: we can get the expected value of a binomial distribution by multiplying the size (or the number of coins), by the probability each is heads.

4. Variance

The expected value measures the center of the distribution, but we also want a measure of how spread out the results are. Statisticians use the variance to measure this. Variance is the average squared distance of each value from the mean of the sample. The variance isn't quite as intuitive as the mean, but it has useful mathematical properties that will become clear in this course. R provides the var() function to calculate the variance from a particular sample. So we could simulate 100,000 draws of a binomial distribution with size 10 and probability point-5, then use var to find the variance of that distribution. We'd see that the variance is very close to 2-point-5. We saw in the last slide that the mean of this distribution is 5, so that means 2-point-5 is the average squared distance between 5 and one random draw. The variance of the binomial distribution in general follows a particular rule, which is that the variance is the size times p times 1 - p. So for example, the variance of the binomial with parameters 10 and point-5 is 10 times /5 time 1 minus point-5: which is 2-point-5, just as we saw in the simulation. We could try this with a second binomial distribution, with size 100 and probability point-2, and we'd see that the variance is 16, the same value we'd get by multiplying 100 times point-2 times 1 - point-2. Just like the expected value, simulation gives us a way to estimate properties of the distribution by drawing many values, while mathematical rules can sometimes also give you an exact answer.

5. Rules for expected value and variance

We thus have two rules for the properties of the binomial distribution. The expected value of the binomial is the size times p, and the variance of the binomial is the size times p times 1 - p.

6. Let's practice!