Geometric distributions

1. Geometric distributions

Now that you're familiar with normal and Poisson distributions, it's time to work with geometric distributions. Let's do it.

2. Geometric modeling

We've already seen how binomial distributions can be used to model a series of experiments with success/failure outcomes. With a geometric distribution we model a series of failed outcomes until we obtain a successful one. For example, if we know that a basketball player has a 0.3 probability of scoring a free throw, what is the probability of missing the first throw and scoring the second? In the geometric model we want to study the probability of trying until we succeed. Let's see how the distribution works!

3. Geometric parameter

The geometric distribution allows us to calculate the probability of success after k trials given the probability of success for each trial. The only parameter is the probability of success for each trial. In the plot on the left, you can see the geometric distribution that models the free throw scoring rate of a basketball player with probability 0.3 of scoring on each throw. On the right, you can see the geometric distribution for a grizzly bear with a 0.033 probability of catching a salmon on each attempt.

4. Probability mass function (pmf)

What is the probability that a geometric random variable is equal to a given value? To answer that question, we use the probability mass function (pmf) and specify p, the probability of success. Imagine we want to calculate the probability that a grizzly bear catches a salmon after failing 29 consecutive times, given that it has a 0.033 probability of success. To do this, we import the geom object from scipy dot stats and call geom dot pmf with k equals 30 and p equals 0.033. The result is 0.02455. In other words, there's a 2% chance that the bear will be successful on its next attempt.

5. Cumulative distribution function (cdf)

Now, what if we want to know the probability of a basketball player scoring a free throw in 4 or fewer attempts? For that we use the cumulative distribution function (cdf). In the plot on the left, you can see the probabilities you have to add to get the result. But on the right we have the cdf, which gives you the value of the probability. To solve this problem, we call geom dot cdf and specify k equals 4 to get 0.76 probability.

6. Survival function (sf)

Moving ahead, if we want to calculate the probability of the player scoring in more than 2 free throws we use sf, the survival function, with k equals 2. The result is 0.49 probability.

7. Percent point function (ppf)

If you instead want the value where you accumulate a given probability, you use the percent point function, ppf. In the basketball example, for 0.6 probability we get a value of 3. We go from probability to value, as the arrow indicates: this means that in 3 attempts we accumulate a probability of 0.6.

8. Sample generation (rvs)

Last but not least, we can generate 10,000 samples of attempts by the basketball player with p equals 0.3 using geom dot rvs. As usual, we import geom from scipy dot stats, then matplotlib dot pyplot as plt and seaborn as sns. Then we call rvs and specify p, the size of the sample, and random_state equals 13 to reproduce the results. To generate the plot we call sns dot distplot with sample as a parameter and kde equals False to avoid the density line, then we call plt dot show. The result is...

9. Sample generation (rvs) (Cont.)

This beautiful plot with the frequency of each possible outcome in each bar. The sum of all the frequencies adds up to 10,000.

10. Let's go try until we succeed!

We're done with the geometric distribution -- now let's do some exercises, and try until we succeed!

This exercise is part of the course

Foundations of Probability in Python

IntermediateSkill Level

4.8+

Start Course for Free

A coin flip is the classic example of a random experiment. The possible outcomes are heads or tails. This type of experiment, known as a Bernoulli or binomial trial, allows us to study problems with two possible outcomes, like “yes” or “no” and “vote” or “no vote.” This chapter introduces Bernoulli experiments, binomial distributions to model multiple Bernoulli trials, and probability simulations with the scipy library.

Exercise 1: Let’s flip a coin in Python Exercise 2: Flipping coins Exercise 3: Using binom to flip even more coins Exercise 4: Probability mass and distribution functions Exercise 5: Predicting the probability of defects Exercise 6: Predicting employment status Exercise 7: Predicting burglary conviction rate Exercise 8: Expected value, mean, and variance Exercise 9: Calculating the expected value and variance Exercise 10: Calculating the sample mean Exercise 11: Checking the result Exercise 12: Calculating the mean and variance of a sample

In this chapter you'll learn to calculate various kinds of probabilities, such as the probability of the intersection of two events and the sum of probabilities of two events, and to simulate those situations. You'll also learn about conditional probability and how to apply Bayes' rule.

Exercise 1: Calculating probabilities of two events Exercise 2: Any overlap?Exercise 3: Measuring a sample Exercise 4: Joint probabilities Exercise 5: Deck of cards Exercise 6: Conditional probabilities Exercise 7: Delayed flights Exercise 8: Contingency table Exercise 9: More cards Exercise 10: Total probability law Exercise 11: Formula 1 engines Exercise 12: Voters Exercise 13: Bayes' rule Exercise 14: Conditioning Exercise 15: Factories and parts Exercise 16: Swine flu blood test

Until now we've been working with binomial distributions, but there are many probability distributions a random variable can take. In this chapter we'll introduce three more that are related to the binomial distribution: the normal, Poisson, and geometric distributions.

Exercise 1: Normal distributions Exercise 2: Range of values Exercise 3: Plotting normal distributions Exercise 4: Within three standard deviations Exercise 5: Normal probabilities Exercise 6: Restaurant spending example Exercise 7: Smartphone battery example Exercise 8: Adults' heights example Exercise 9: Poisson distributions Exercise 10: ATM example Exercise 11: Highway accidents example Exercise 12: Generating and plotting Poisson distributions Exercise 13: Geometric distributions

Current Exercise

Exercise 14: Catching salmon example Exercise 15: Free throws example Exercise 16: Generating and plotting geometric distributions

No that you know how to calculate probabilities and important properties of probability distributions, we'll introduce two important results: the law of large numbers and the central limit theorem. This will expand your understanding on how the sample mean converges to the population mean as more data is available and how the sum of random variables behaves under certain conditions. We will also explore connections between linear and logistic regressions as applications of probability and statistics in data science.

Exercise 1: From sample mean to population mean Exercise 2: Generating a sample Exercise 3: Calculating the sample mean Exercise 4: Plotting the sample mean Exercise 5: Adding random variables Exercise 6: Sample means Exercise 7: Sample means follow a normal distribution Exercise 8: Adding dice rolls Exercise 9: Linear regression Exercise 10: Fitting a model Exercise 11: Predicting test scores Exercise 12: Studying residuals Exercise 13: Logistic regression Exercise 14: Fitting a logistic model Exercise 15: Predicting if students will pass Exercise 16: Passing two tests Exercise 17: Wrapping up