1. A little bit of background
At the end of this video we are going do some actual Bayesian data analysis, but first, we need a little bit of background.
Bayesian data analysis is named after
2. Thomas Bayes
Thomas Bayes who in the middle of the 18th century wrote the first article describing what we today would call Bayesian inference. But the term Bayesian inference doesn’t really give you any clues to what it is, and a better term would be probabilistic inference because that’s what you do when you do Bayesian data analysis. It’s really just about using the full power of
3. Probability
probability theory to draw conclusions and learn from your data. Confusingly the term probability can be defined in different ways. All definitions agree on the basic rules of probability and that it’s a number between 0 to 1, but they don’t agree on what probabilities stand for. The definition we are going to use here is that a probability is a statement about the certainty or uncertainty of different outcomes, where a probability of 1 means complete certainty that something is the case or is going to happen and 0 mean complete certainty that this something is not the case or that it’s not going to happen. This definition is very similar to the common sense use of probability. Like, you might say “I’m 99 % sure it’s gonna rain tomorrow”, which means you’re very certain, or you might say “It’s a 50-50 chance it’s going to rain” which means you’re very uncertain, it could go either way. Probability does not only have to be about yes/no-type of events, but it can also be used to describe uncertainty over continuous quantities. For example,
4. Rain probability distribution 1
here is a graph showing the probability over how many inches it will rain next week. Each bar shows the probability for the corresponding outcome, and together the probabilities sum to one. This graph here is also an example of a
5. Rain probability distribution 1
probability distribution, which just is an allocation of probability over many mutually exclusive outcomes. So,
6. The role of probability distributions
the role of probability distributions in Bayesian data analysis is to represent uncertainty, and the role of Bayesian inference is to update these probability distributions to reflect what has been learned from data. All this sounds a bit abstract, let’s try running a simple Bayesian model and actually see how it looks. Let’s look at
7. A Bayesian model for the proportion of success
a Bayesian model for an underlying proportion of success. What is “success” here? Well, it could be curing a patient, getting a click on an ad, getting tails when flipping a coin, etc. It depends on what data you have. And what we’re often interested in is what the underlying proportion of success is. Like, what proportion of patients would be cured by this drug, say. I’ve implemented a Bayesian model in R that estimates this and given it the name prop_model. prop_model takes data as its first argument and assumes that: the data is a vector of successes and failures represented by 1s and 0s, that there is an unknown underlying proportion of success and whether a data point is a success or not is only affected by this proportion, and prior to seeing any data any underlying proportion of success is equally likely. The result of prop_model is a probability distribution that represents what the model knows about the underlying proportion of success after having observed the data.
8. Trying out prop_model
Let’s start by seeing what happens when we run the model with no data. Huh, we get a big blue square. The x-axis in this graph shows different values for the proportion of success and the y-axis shows the probabilities of the different values. The blue square is a uniform probability distribution saying that any proportion of success has equal probability. It’s labeled “Prior” because one assumption of the model was that prior to seeing any data any underlying proportion of success was equally likely. Now, let’s add a datapoint, let’s say this is from an experiment on whether patients got cured by a new drug.
9. Trying out prop_model
Unfortunately, the first patient didn’t get cured, marked red here for failure. The model now knows that a high proportion of success is improbable because if it would have been high, we should have cured this first patient too.
10. Trying out prop_model
The second patient got cured. Now the model knows that the proportion of success is improbable to be close to 0 or close to 1.
11. Trying out prop_model
12. Trying out prop_model
13. Trying out prop_model
The next three patients didn’t get cured, and for each failure, it becomes more and more probable that the underlying proportion of success is low.
14. Trying out prop_model
A final cured patient and what we know at the end of the experiment, after six data points, is that the underlying proportion of success is probably around 0.4, but with this little data there is a large uncertainty and it could be as low as 0.10 or as high as 0.75. That was a quick little example of a simple Bayesian model. Now, prop_model isn’t part of any R package, I’ve just made it myself, but at the end of this course, you will know enough to implement it yourself too.
15. Now, you try out prop_model!
But first, it’s your turn to try out prop_model in a couple of exercises.