1. Generating discrete random variables
In the rest of this chapter, we'll learn several probability distributions, because picking and sampling from the correct probability distributions are the most important steps in any Monte Carlo simulation.
First, we'll learn how to generate discrete random variables for use in our simulations.
2. Required imports
Throughout the rest of this course, we will be using the many distributions available in SciPy's stats module. Import it as "st"
along with several other familiar packages.
3. Discrete uniform distribution
The first distribution we'll look at is the discrete uniform distribution.
The sampling behaviors of this distribution can be illustrated by its theoretical probability mass function, or PMF, shown in the barplot. Here, the distribution interval spans from three to twenty. The x-axis shows what values are available from randomly sampling this distribution, and the bar heights indicate the corresponding sampling probabilities.
Notice that when sampling from this distribution, only integer values are available, and there is an equal chance of obtaining any number between three and twenty.
This distribution might be used to sample from a city where there are roughly equal numbers of kids aged from three to 20 years old.
4. Sampling from the discrete uniform distribution
We'll use SciPy's randint to access the discrete uniform distribution and chain the dot-rvs function to sample from it. "rvs" stands for "random variable sampling". The samples object returned by dot-rvs is a NumPy array. The parameters "low" and "high" are the range for the distributions, and the size argument specifies how many samples we'd like to obtain.
Here's a histogram of the simulated results. Notice that the low and high parameters are three and 21, but the sampling results range from three through 20. In the rvs function, the stop value is not included in the sampled distribution. Notice that the sampling results show that the different integers do not have the same count. Why? Remember that sampling is stochastic: it is random, but the more we sample, the more the results resemble the theoretical probability distribution.
5. Geometric distribution
Let's look now at the geometric distribution, which represents the probability distribution of the number of trials, X, which are needed to get one success, given the success probability, p, of one trial.
Here's the theoretical PMF of a geometric distribution with p equals 0-point-5. There is a fifty percent chance of success with just one trial and a 25 percent chance that two trials are required for success.
For example, assume a law school graduate has a 50 percent chance of passing their licensing exam. The geometric distribution captures the number of exam attempts that this graduate needs to pass the exam.
6. Geometric distribution
Here's the PMF of a distribution with p of 0-point-3.
Decreasing the success rate, p, will increase the expected number of trials to get one success.
7. Sampling from geometric distribution
SciPy's dot-geom obtains the geometric distribution, and we again sample from it by chaining the dot-rvs function. We specify the success rate in one trial, p, and the number of samples using the size argument. A histogram of the results shows a geometric distribution as expected.
8. More discrete probability distributions
There are many more discrete distributions to consider. For example, we use Poisson distributions to express the probability of a given number of events occurring in a fixed interval of time or space if these events are independent and occur with a known constant mean rate.
Binomial distributions are used to simulate the number of successes in a sequence of n independent experiments.
For more on these and other distributions, SciPy's documentation is a great resource.
9. Let's practice!
Now it's your turn!