Probability rules

1. Probability rules

The method we've been using to fit Bayesian models so far has involved generating a large number of samples representing the

2. A binomial joint distribution

joint probability distribution over parameters and unknown data. And then conditioning on the observed data by

3. A conditioned binomial joint distribution

filtering away all samples that don’t match. This is a simple method and we implemented in just a couple of lines of R code, but

4. Bad and good news

the bad news is that this computational method scales horribly, both with larger data sets and with more complicated models. But there are two pieces of good news. First, Bayesian computation is a hot research topic, and there are tons of skilled scientists out there working hard on new methods to allow you to fit Bayesian models more efficiently. Second, the result of using a more efficient method will still be the same as if you had used the slower method, as the Bayesian model is still the same, so everything you’ve learned so far still applies. The only difference is that with a faster method you’ll get the result now rather than in a hundred years. To work with and understand faster computational methods one does need to know a bit about

5. Probability theory

probability theory. So, a probability was a number between zero and one which we use to state the certainty or uncertainty of propositions / parameters / future data, etc. A mathematical notation that is sometimes used is the following. Here the P stands for probability, and this simply means: the probability of the number of visitors being equal to 13. Now, a probability distribution was an allocation of probability over many mutually exclusive outcomes and if you just write P(n_visitors). This refers to the probability distribution over all possible numbers of visitors. In statistics it’s common to talk about conditional probability, that is, the probability of this given that we know that. In probability notation, this is written with a vertical bar, so this means; The probability of getting 13 visitors given that the proportion of clicks is 10%. This is a single probability. But it also works with probability distributions, so this denotes the conditional probability distribution of the probability over the possible numbers of visitors given that the proportion of clicks is 10%. That is, this refers to

6. P(n_visitors | prop_clicks = 10%)

the same probability distributions as you simulated before. Finally, you can

7. Manipulating probability

manipulate and combine probabilities using addition and multiplication. The first basic rule is

8. Manipulating probability

the sum rule. If two possible outcomes are mutually exclusive then we can sum up their probability to get the total probability that either or will be the outcome. For example,

9. Manipulating probability

the probability of getting a 1 or 2 or 3 when rolling a die is

10. Manipulating probability

1/6 + 1/6 + 1/6. That is, there is a 50% probability. The second basic rule is

11. Manipulating probability

the product rule: If two possible outcomes are unrelated or independent, then we can multiply their probabilities to get the probability that both will be the actual outcome. For example,

12. Manipulating probability

the probability of rolling a six with one die and a six with another die is

13. Manipulating probability

1/6 times 1/6, that is, a 2.8% probability. This was all the probability notation and basic rules you’ll need for now.

14. Manipulating probability

If you want to dig deeper I really recommend that you take a look at Dave Robinson’s DataCamp course "Foundations of Probability in R".

15. Let's try out these rules!

But for now let’s try out these rules in a couple of exercises.

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.