1. Under the Bayesian hood
Welcome to Chapter 2!
2. Bayes' Theorem revisited
Here is Bayes' formula which we previously used to calculate conditional probabilities of events. However, it can also be used to estimate model parameters! Let's replace A with parameters and B with data.
3. Bayes' Theorem revisited
Here, each of the probabilities in the formula denotes a probability distribution rather then a single number.
The term on the left-hand side is what we are interested in: the posterior distribution, reflecting our knowledge about the parameters given the data we have.
On the right-hand side, we have the prior distribution, which is what we know about the parameters before seeing any data, multiplied with the likelihood, which says how likely the data is given the parameters. The whole thing is divided by a scaling factor to make sure it is a proper distribution that sums up to one. Don't worry if this seems abstract now, as we will demonstrate it next with an example.
Previously you used the get-heads-prob function, which returns the draws from the posterior distribution. Now, let's calculate this distribution exactly using Bayes' Theorem and a technique called grid approximation.
4. Tossing the coin again: grid approximation
Consider this question: what's the probability of tossing heads with a coin, if we observed 75 heads in 100 tosses?
We start the grid approximation by creating a grid of all possible combinations of the number of heads in 100 tosses and the probability of tossing heads. Obviously, we cannot list all probabilities from 0 to 1, so we'll make a grid by 1 percentage point. We can create the grids with numpy-dot-arange and list all combinations of the two using a list comprehension that loops over both arrays. Finally, we make it a DataFrame and call it "coin".
5. Tossing the coin again: grid approximation
Now, we need a prior: what we know about the probability of tossing heads before seeing any data. Say we know nothing, it could be anything between 0 and 1. In that case, uniform distribution is the right prior choice. We can import it from scipy-dot-stats, and then use its pdf method (which stands for probability density function) to get the prior probability for each head-prob value in the "coin" DataFrame.
6. Tossing the coin again: grid approximation
Since under the uniform distribution all head-prob values are equally likely, we'll get ones everywhere.
7. Tossing the coin again: grid approximation
Next, the likelihood. We will model coin flips using a binomial distribution, so we import it from scipy-dot-stats. It has a pmf method which stands for probability mass function. It's like a pdf for discrete distributions. It takes three arguments: the number of heads, the total number of tosses, and the heads probability. We compute the likelihood for each row in the DataFrame.
8. Tossing the coin again: grid approximation
Consider the first row. The likelihood is 1 because if the heads probability is 0, then observing 0 heads in 100 tosses is 100% likely. In the second row, the heads probability is only slightly above zero. In this scenario, the likelihood of observing 0 heads in 100 tosses is almost 37%.
9. Tossing the coin again: grid approximation
Next, we follow Bayes' formula: we multiply the prior with the likelihood and scale by the product's sum to get the posterior probabilities. The division-assignment operator overwrites the posterior_prob column with itself divided by its own sum.
10. Tossing the coin again: grid approximation
11. Tossing the coin again: grid approximation
Phew! Here's all the code we have written put together for future reference.
12. Plotting posterior distribution
Finally, we can answer our question! We take only rows with num-heads equal to 75 and scale the posterior again to make sure it sums up to one. We can use the seaborn-lineplot function to plot different probabilities of tossing heads on the horizontal axis against their posterior probabilities on the vertical axis.
13. Plotting posterior distribution
We get this posterior density plot of the probability of tossing heads.
Our estimate is that it is likely around 75%, with the values between more or less 60% and 85% being possible.
14. Let's practice calculating posteriors using grid approximation!
Let's practice calculating posteriors using grid approximation!