1. Change anything and everything
In the last exercises, you took
2. Video vs text - posteriors
the posteriors from two different models and using nothing but basic arithmetic you calculated
3. Video vs text - Posterior difference
the probability distribution over the difference in profit between using video and text ads. This was all pretty easy to do, and the surprising result was that, if forced to choose, you should actually go for text ads. But what this distribution is also telling you is that there is large uncertainty over which type of ad is the best, and you really should get some more data before you make a decision. So you’ve already seen that it is easy to tweak the prior of a Bayesian model, but
4. Next up on reasons to use Bayesian data analysis
the fourth reason to use Bayes is that you can also completely change the underlying statistical model, often with not too much effort. Let’s take a look at an example where you will
5. Completely switch out the binomial model
completely switch out the binomial model we have worked with so far. Why would you want to do that? Well, you have some new data to analyze, where the Binomial model isn’t going to work well. See, you happen to have a friend, also in the zombie e-commerce business, and she has offered to put up an ad for your site as a banner on her site, for a small daily fee, of course. The thing is, you wouldn’t pay per view, the banner is always up, you would pay per day. As a trial, she already put it up on her site for just one day, and during that time it resulted in 19 clicks and visits to your site. The question is: How many daily site visits, should we expect, on average, if we pay for this banner? Now it’s obvious that the binomial model won’t directly work here, it models the number of successes out of a total number of shown ads, but now the ad is shown all the time, and we just have
6. A model for counts per day
a daily count. What we could do is that we split each day into minutes and assume some unknown proportion of minutes results in a success, a click on the ad. One problem with this generative model is that it can’t model more than one click per minute, either it’s a success or not. We could, sort of, fix this by
7. A model for counts per day
splitting the day into seconds instead,
8. A model for counts per day
or why not milliseconds,
9. A model for counts per day
or even smaller. In the limit, we end up with another generative model that also has a name:
10. The Poisson distribution
the Poisson process or the Poisson distribution. It has one parameter: The mean number of events per measured time unit, in this case, it’s the number of clicks per day. In R you can sample from the Poisson distribution using the rpois function, and just like with the rbinom function you can plug in different parameter values and use it to generate simulated data.
11. The Poisson distribution
For example, if we knew that the mean number of clicks per day was 20, then this would give you the probability distribution over how many clicks you would expect to get, say, tomorrow. Now, of course, for the banner ad we don’t know the underlying mean number of clicks, so
12. Let's find out in the exercises!
let’s try to find that out in the next couple of exercises by completely replacing the Bayesian binomial model by a Poisson model instead.