1. The geometric distribution
Suppose this coin I'm holding has a 10% chance of heads. Now instead of flipping it a fixed number of times, I keep flipping it until the first time I see heads.
How many tails do you think I'll get before the first time I get heads? I could get heads on the first try, so it could be zero, or I could be standing here flipping a coin all day. What can I expect?
This random variable, where you're waiting for particular event with some probability, is called a geometric distribution, and it's the last one we'll explore in this course.
2. Simulating waiting for heads
One way to simulate waiting for a result of heads is to flip a large number of coins and see what the first case is. For example, with rbinom 100, 1, point-1 we can simulate flipping 100 coins, each with a 10% probability of heads, in sequence. In this case we're only showing the first 30, and here we can see that two of the first 30 coins were heads, but the rest were tails.
If we don't want to count the coins manually, we can use the which function. which returns the indices that fit a particular condition. For example, with which flips equals equals 1, we can find out which coins in the sequence were heads.
In this case, we see the first heads was the eighth. We're only interested in the first heads, so let's use bracket 1 end bracket to extract the first item. This code therefore gave us one draw of this random variable.
3. Replicating simulations
We could repeat this code to perform more draws from this distribution, and get a sense of their possible values. In one draw, the first heads could be the 28th flip. In another, it could be the 4th. In another case, the 11th.
It's a hassle to keep writing the same line of code to perform multiple draws. So R makes it easy, with the replicate function. Replicate takes two arguments; the first is the number of replications to perform. The second is the line of code-you can just paste it directly in.
The result is a set of ten outcomes-22, 12, 6, and so on: each one representing one draw where we kept flipping a coin and waiting for the first heads.
This is the geometric distribution. It's useful for modeling situations where, for example, a machine has a 10% chance of breaking each day, and you want to know how long it will last before it breaks. As you see, such a machine might last for weeks, or it might break on one of the first days.
4. Simulating with rgeom
It's worth understanding how you can generate a distribution like this yourself with the replicate() function, since being flexible with simulations is important in probability. But in this case, R does provide a shortcut.
Much as you've seen the rbinom, the rnorm, and the rpois functions for simulating draws from the binomial, normal, and Poisson, you can use the rgeom function to simulate draws from the geometric distribution. You give it the number of draws, such as one hundred thousand, and the probability of the event you're waiting for, in this case point-1.
Notice that the distribution is steadily decreasing in density; every possible value being less likely than the previous one. The most likely value is therefore 0; meaning, in this case, that there were no tails before the first heads.
However, what's the expected value, or the mean, of this distribution? We can estimate it with mean on this simulation, and see that it's about 9. That is, when each coin has a 10% chance of heads, the first heads will, on average, be the tenth coin.
The general rule is that the expected value is 1 divided by the probability, minus 1. The minus one comes from the fact that R defines the geometric distribution as the number of tails before the first heads, if we defined it as the first heads, it would be simply 1/p.
This means, for example, if each coin had a 50% chance of heads, the expected value would be 1-just one tails, on average you'd see one tails before the first heads.
In your exercises, you'll use the geometric distribution to model useful situations like waiting for a machine to break.
5. Let's practice!