Probability rules
1. Probability rules
The method we've been using to fit Bayesian models so far has involved generating a large number of samples representing the2. A binomial joint distribution
joint probability distribution over parameters and unknown data. And then conditioning on the observed data by3. A conditioned binomial joint distribution
filtering away all samples that don’t match. This is a simple method and we implemented in just a couple of lines of R code, but4. Bad and good news
the bad news is that this computational method scales horribly, both with larger data sets and with more complicated models. But there are two pieces of good news. First, Bayesian computation is a hot research topic, and there are tons of skilled scientists out there working hard on new methods to allow you to fit Bayesian models more efficiently. Second, the result of using a more efficient method will still be the same as if you had used the slower method, as the Bayesian model is still the same, so everything you’ve learned so far still applies. The only difference is that with a faster method you’ll get the result now rather than in a hundred years. To work with and understand faster computational methods one does need to know a bit about5. Probability theory
probability theory. So, a probability was a number between zero and one which we use to state the certainty or uncertainty of propositions / parameters / future data, etc. A mathematical notation that is sometimes used is the following. Here the P stands for probability, and this simply means: the probability of the number of visitors being equal to 13. Now, a probability distribution was an allocation of probability over many mutually exclusive outcomes and if you just write P(n_visitors). This refers to the probability distribution over all possible numbers of visitors. In statistics it’s common to talk about conditional probability, that is, the probability of this given that we know that. In probability notation, this is written with a vertical bar, so this means; The probability of getting 13 visitors given that the proportion of clicks is 10%. This is a single probability. But it also works with probability distributions, so this denotes the conditional probability distribution of the probability over the possible numbers of visitors given that the proportion of clicks is 10%. That is, this refers to6. P(n_visitors | prop_clicks = 10%)
the same probability distributions as you simulated before. Finally, you can7. Manipulating probability
manipulate and combine probabilities using addition and multiplication. The first basic rule is8. Manipulating probability
the sum rule. If two possible outcomes are mutually exclusive then we can sum up their probability to get the total probability that either or will be the outcome. For example,9. Manipulating probability
the probability of getting a 1 or 2 or 3 when rolling a die is10. Manipulating probability
1/6 + 1/6 + 1/6. That is, there is a 50% probability. The second basic rule is11. Manipulating probability
the product rule: If two possible outcomes are unrelated or independent, then we can multiply their probabilities to get the probability that both will be the actual outcome. For example,12. Manipulating probability
the probability of rolling a six with one die and a six with another die is13. Manipulating probability
1/6 times 1/6, that is, a 2.8% probability. This was all the probability notation and basic rules you’ll need for now.14. Manipulating probability
If you want to dig deeper I really recommend that you take a look at Dave Robinson’s DataCamp course "Foundations of Probability in R".15. Let's try out these rules!
But for now let’s try out these rules in a couple of exercises.Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.