Samples and posterior summaries

1. You just did some Bayesian data analysis!

Hey, you just did some Bayesian data analysis! You took a Bayesian model, gave it some data, and

2. Prop model result

got out the probability distribution over the underlying proportion of success for your zombie drug. In Bayesian jargon, you took a "prior probability distribution", "updated it" with data, and the result was a "posterior probability distribution". These two words, prior and posterior, are used so much in Bayesian data analysis that they are worth pointing out. A prior probability distribution is a distribution over some unknown quantity that you have before, prior to, updating it with some data. Here it would be

3. Prop model result - anotated 1

the blue distribution. And the posterior probability distribution is what this distribution turned into after, posterior to, we updated it with data. Here it would be

4. Prop model result - anotated 2

the last distribution at n=13. Often people drop the “probability distribution”-part and just call these

5. Priors & Posteriors

priors and posteriors. So, a prior is a probability distribution that represents what the model knows before seeing the data. A posterior is a probability distribution that represents what the model knows after having seen the data. When fitting a Bayesian model, the end result is always a posterior over some quantity or parameter of interest. The posterior represents how uncertain the model is of the underlying value, and in the zombie example, the posterior was displayed

6. Prop model result - anotated 2 again

as a density plot showing the location of the probability. This could be a good way of communicating the result, but we might also want to summarize it further, for example, we might want to calculate a “best guess” for what the proportion of cured zombies would be. This calculation is easier to do if the posterior is represented as a vector of samples rather than as a plot. What do I mean by this?

7. The probability distribution over the number of 6's when rolling 5 dice

Well, take, for example, the probability distribution over the number of sixes you would get when rolling 5 dice. You can represent that distribution as a plot, or as a mathematical function, if that’s your thing.

8. The probability distribution over the number of 6's when rolling 5 dice

But you can also represent it as a long vector of samples where a value occurs proportionally often to how probable it is. Having such a vector makes it easy to calculate new measures from the probability distribution. Say we want to know the average number of sixes we would expect to roll. This is hard to read from the plot, but easy to calculate using the vector of samples as we can directly use the mean function in R. Turns out the average number of sixes is 0.83. The easiest way of generating samples is by drawing a random sample from the probability distribution, and if this sample is large enough, each value will occur roughly proportionally often to how probable it is.

9. Posterior

Now, it’s actually the case that the prop_model function also returns a large random sample from the posterior distribution.

10. Finish off the Zombie drug analysis!

So, finish off the zombie drug analysis you started in the last exercise by calculating a couple of relevant summaries using the sample returned by prop_model.