1. You've changed the prior!
You’ve now changed the prior to include the information you got from the social media company. What effect did that have?
2. Uninformative prior and posterior
Well, here is the old uniform prior and posterior.
3. Informative prior
And here is the new informed prior. If we didn’t have any data at all, then the prior would be all the model would know, but if we had lots of data the information in the data would overwhelm the prior information and we would end up with the pretty much the same posterior distribution independent of what prior was used. But now we just have a little data, so the
4. Informative posterior
resulting posterior is a mix of the information from the prior and the information from the data. The data is enthusiastic: “Hey, the proportion of clicks is likely around 13%!” The prior is less so: “Mmmm, it’s likely around 5%”. The resulting posterior is informed by both the data and the prior and ends up somewhere in between. Now we have two different models of the same data, so which should we choose? There is no correct answer here, but if the informed prior is based on genuinely good information, then the resulting estimate should be better. In this case, it’s up to you if you believe the numbers you got from the social media company. Going forward we’re going to go back to the uniform distribution between 0% and 20% we used before, but now you know how easy it is to switch the prior, would you want to. Next up on reasons to use Bayesian data analysis is that
5. Next up on reasons to use Bayesian data analysis
it is easy to compare and contrast any outcomes from Bayesian models. A typical example of when you want to make comparisons is when you have two different experimental groups, like two different treatments or two different methods, and you want to compare these and see which seems the best. For example, say that the ads you’ve been running so far have been
6. Video vs Text 1
video ads, but you’ve been thinking that
7. Video vs Text 2
text ads could be more effective. To try this out you also paid for 100 text ads to be shown on the social media site, and that resulted in
8. Video vs Text 3
6 clicks and visits to your site. But as the video ads resulted in 13 clicks it seems like they are more effective, but how sure should you be of this? We could run the same model on the data from the video ads and the text ads, and take a look at
9. Video vs Text analysis
the corresponding posteriors over the underlying proportions of clicks. It looks like it’s more probable that the proportion of clicks is lower for the text ad, but there is some overlap between the probability distributions. What we would want is to compare the performance of the text and video ads in such a way that we got out a new probability distribution showing the probable difference. And this is easy to get, especially since these two distributions are represented by long vectors of samples.
10. Comparing Video and Text ads
Here I’ve taken the samples that make up these two probability distributions, given them the names video_prop and text_prop, prop as in proportion, and put them into a data frame called posterior. As long as these samples are in a random order, and as long as I do it row by row, I can now calculate any type of derivative quantity and the resulting new distribution of samples will correctly retain the uncertainty of these original two distributions. Now, we were interested in the difference, so for each row let’s
11. Comparing Video and Text ads
subtract text_prop from video_prop and put it into the column prop_diff. Looking at the first couple of samples in prop_diff we see that for most rows video ads are better than text ads. The whole vector of samples in prop_diff now represents the posterior probability distribution over the difference between video ads and text ads.
12. How does the prop_diff() distribution look?
So how does this distribution look? Well, you’ll find out in the following exercises!