Bayes' theorem

1. Bayes' theorem

Bayes’ theorem, is an equation that is often mystified, and I promised that I would explain it in this chapter. But you know what? You’ve already used it! This

2. This is Bayes' theorem!

piece of code is an example of Bayes theorem! Let’s rewrite it into probability notation so that it feels more “official”. We calculated

3. This is Bayes' theorem!

the probability of different parameter values given some new data. And this is what it says here, the O with the bar is the Greek letter theta, which is often used to represent parameters, the capital D is the data. And this just reads “the probability of the parameters theta given the data D”. Ok, this we calculate as

4. This is Bayes' theorem!

the likelihood, which was the probability (or sometimes, relative probability) of the data given different parameter values,

5. This is Bayes' theorem!

times the prior, which was the probability of different parameters theta, given nothing, that is, before seeing the data. And finally, we needed to normalize this so that the sum of all the probabilities over different parameter values totaled one. That is,

6. This is Bayes' theorem!

we divided by the total sum of the likelihood, weighted by the prior. So this equation here is Bayes theorem!

7. This is Bayes' theorem!

But it just describes what you calculated yourself in the previous exercise. The technique you used, last exercise, to fit the model is often called

8. Grid approximation

grid approximation. Grid, because you define a grid over all the parameter combinations you need to evaluate, and approximation because you often cannot try all parameter combinations, like when you have a continuous parameter, so you have to settle for a hopefully representative subset. But in the first chapter, we used sampling to fit the very same model. So two very different algorithms that in the end gives the same result and fits the same model. And it’s actually the case that there are many more algorithms to fit Bayesian models, some more efficient than others. But as the model, and the result of fitting the model, stays the same, there is

9. A mathematical notation for models

a mathematical notation used to define only the model, leaving it open what algorithm is used to fit the model. The binomial model we’ve used so far could be written like this: The number of shown adds is fixed to 100. p_clicks, the underlying probability of a click, is uncertain, and the model assumes it could be anything from zero to 0-point-2. That is, p_clicks is distributed, the little tilde or squiggly character is read as “is distributed”, as a Uniform distribution between zero and 0-point-2. n_visitors is also uncertain, and the model assumes n_visitors is distributed as a binomial distribution with n_ads trials with a p_clicks proportion of success. This notation, informally called tilde-notation, can be found in many Bayesian books and papers, and is a convenient way of defining the model without going into computational details. This model has one data point and one parameter. But in the next, last chapter

10. Up next: More parameters, more data!

we’re going to raise the stakes! And take a look at a two-parameter model with many more data points. But, as this whole course has been focusing on the basics of Bayes, and it might feel like we’re taking baby steps, you’ll also get to try a Bayesian tool that you as a Data Scientist could use in production.

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.