Get Started

Geometric distributions

1. Geometric distributions

Now that you're familiar with normal and Poisson distributions, it's time to work with geometric distributions. Let's do it.

2. Geometric modeling

We've already seen how binomial distributions can be used to model a series of experiments with success/failure outcomes. With a geometric distribution we model a series of failed outcomes until we obtain a successful one. For example, if we know that a basketball player has a 0.3 probability of scoring a free throw, what is the probability of missing the first throw and scoring the second? In the geometric model we want to study the probability of trying until we succeed. Let's see how the distribution works!

3. Geometric parameter

The geometric distribution allows us to calculate the probability of success after k trials given the probability of success for each trial. The only parameter is the probability of success for each trial. In the plot on the left, you can see the geometric distribution that models the free throw scoring rate of a basketball player with probability 0.3 of scoring on each throw. On the right, you can see the geometric distribution for a grizzly bear with a 0.033 probability of catching a salmon on each attempt.

4. Probability mass function (pmf)

What is the probability that a geometric random variable is equal to a given value? To answer that question, we use the probability mass function (pmf) and specify p, the probability of success. Imagine we want to calculate the probability that a grizzly bear catches a salmon after failing 29 consecutive times, given that it has a 0.033 probability of success. To do this, we import the geom object from scipy dot stats and call geom dot pmf with k equals 30 and p equals 0.033. The result is 0.02455. In other words, there's a 2% chance that the bear will be successful on its next attempt.

5. Cumulative distribution function (cdf)

Now, what if we want to know the probability of a basketball player scoring a free throw in 4 or fewer attempts? For that we use the cumulative distribution function (cdf). In the plot on the left, you can see the probabilities you have to add to get the result. But on the right we have the cdf, which gives you the value of the probability. To solve this problem, we call geom dot cdf and specify k equals 4 to get 0.76 probability.

6. Survival function (sf)

Moving ahead, if we want to calculate the probability of the player scoring in more than 2 free throws we use sf, the survival function, with k equals 2. The result is 0.49 probability.

7. Percent point function (ppf)

If you instead want the value where you accumulate a given probability, you use the percent point function, ppf. In the basketball example, for 0.6 probability we get a value of 3. We go from probability to value, as the arrow indicates: this means that in 3 attempts we accumulate a probability of 0.6.

8. Sample generation (rvs)

Last but not least, we can generate 10,000 samples of attempts by the basketball player with p equals 0.3 using geom dot rvs. As usual, we import geom from scipy dot stats, then matplotlib dot pyplot as plt and seaborn as sns. Then we call rvs and specify p, the size of the sample, and random_state equals 13 to reproduce the results. To generate the plot we call sns dot distplot with sample as a parameter and kde equals False to avoid the density line, then we call plt dot show. The result is...

9. Sample generation (rvs) (Cont.)

This beautiful plot with the frequency of each possible outcome in each bar. The sum of all the frequencies adds up to 10,000.

10. Let's go try until we succeed!

We're done with the geometric distribution -- now let's do some exercises, and try until we succeed!