1. The Poisson distribution
In the last chapter we tried flipping a very large number of coins, and we noticed that the resulting number of heads could be approximated by a normal distribution. In this lesson we'll introduce another distribution that's related to the binomial, the Poisson distribution.
2. Flipping many coins, each with low probability
Suppose that like the last lesson, we flip one thousand coins. But this time, every coin has only a one in one thousand probability of being heads. We're considering how often a rare event happens out of a large number of opportunities.
We can simulate this with rbinom, by setting the second parameter to 1000 and the third to be 1 divided by 1000.
The histogram would look like this. Notice that unlike the last exercises, this distribution doesn't look like a bell curve at all! For one thing, it's not symmetrical, because the number of heads can't be smaller than zero. This particular case of the binomial, where n is large and p is small, can be approximated by the Poisson distribution.
3. Properties of the Poisson distribution
The binomial distribution required two parameters to define it: a size, or number of flips, and a probability that each is heads.
The Poisson distribution is simpler, because it's described by only one parameter: the mean, which for the Poisson we usually call by the greek letter "lambda".
The mean of the binomial with 1000 flips and 1 / 1000 for each is simply 1. So to simulate the corresponding Poisson, we would use the rpois function, and after the argument 100,000 give it the parameter 1. We can see from the compare_histograms function that these are similar. That means for this distribution, we don't need the extra detail that it's out of 1000 coins, we just need the mean.
One interesting fact about the Poisson distribution is that the variance is equal to the mean. That makes it convenient to work with, since you don't need to calculate the variance when you're simulating or estimating it.
4. Poisson distribution
The Poisson distribution can have any mean, as long as it's positive. It could have one very close to 0, like .1, in which case most of the outcomes will be 0 or 1. It could have a larger value like 10, in which case it will look a bit more symmetrical.
Statisticians and scientists use the Poisson distribution when they're modeling rare events as counts, and when they don't care about the total in the way we would with the binomial distribution.
For example, you could be running a bookstore, and modeling how many people walk in in each hour. You could be counting whales within a section of the ocean on one day, or counting cells under a microscope. In one sense each of these is technically a fraction of a total- a percentage of all the people, or whales, or cells in the world. But you wouldn't think of it that way: you don't care about the probability of seeing every whale in the world, you care about the number that you see.
You'll explore a bit more about the Poisson distribution in these exercises.
5. Let's practice!