Prior belief

1. Prior belief

Let's learn about prior distributions!

2. Prior distribution

Prior distribution reflects what we know about the parameters we want to estimate before observing any data. We could know nothing and choose the uniform prior, where all parameter values are equally likely. A posterior from a previous analysis can also be used as a new prior to update the posterior as more data came in. In general, however, one can choose any probability distribution as a prior. This allows us to include external information in the model, such as expert opinion, common knowledge, previous research results, or even our own beliefs.

3. Prior's impact

Consider a coin toss again. We choose a uniform prior, shown in blue, which assumes that all heads-probabilities between 0% and 100% are equally likely. We toss twice and get one heads and one tails. The posterior, shown in orange, is quite similar to the prior. It looks like a heads-probability around 50% is most likely, but other values are also possible. This is because there is not enough data to learn from - almost all we know is the prior. We toss once more and get a heads. The posterior moves slightly to the right, indicating that heads is more likely than tails, although other values of heads-probability are still possible. As we get more data, the posterior becomes narrower and higher. With much more data, this effect is even stronger, and the prior plays a smaller role.

4. Prior distribution

We should choose the prior distribution before we see the data. As we've just seen, with little data, the prior strongly impacts the shape of the posterior distribution. Even with more data, the prior choice can still impact the posterior results. To avoid accusations that we have cherry-picked a prior to produce desired results, we must follow two rules. The prior choice should be clearly stated and explained.

5. Choosing the right prior

But how to make this choice? Imagine we believe for some reason that tossing heads is unlikely. We could capture such a belief in many different distributions. Consider these two: beta and log normal. They are pretty similar. According to both, probability of heads is likely around 25%. However, one of them is a better choice than the other.

6. Conjugate priors

Some priors, multiplied with specific likelihoods, yield posteriors of a known form. They are known as conjugate priors. For the coin tossing data, beta distribution is a conjugate prior. It means that if we choose a beta prior, then the posterior will be of known form, in this case also beta. How do we know this? Well, because it has been mathematically proved. Wikipedia has a list of many conjugate priors for different use cases, for further reference. Since the posterior is a known distribution, we can easily sample draws from it using functions from numpy-dot-random, as you have done previously. Remember the get-heads-prob function? I implemented it using a beta prior and sampling 1000 draws from the posterior defined by the equation on the slide. Much easier than the grid approximation!

7. Two ways to get the posterior

So, we have two ways to get the posterior: we can simulate or calculate. If we use a conjugate prior which makes the posterior known, we can sample from it using numpy to get an array of draws that can be plotted with seaborn's kdeplot function. If the posterior is not known, we can still calculate it using grid approximation. We then get a posterior probability for each grid element and can plot them against each other with seaborn's lineplot function. The two methods give exactly the same results. Previously, we calculated using grid approximation to understand what's happening under the hood. In practice, however, simulation is easier and faster.

8. Let's practice working with priors!

Let's practice working with priors!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.