Markov Chain Monte Carlo and model fitting

1. Markov Chain Monte Carlo and model fitting

Welcome back!

2. Bayesian data analysis in production

We've seen two ways of sampling posterior draws: grid approximation and choosing conjugate priors and sampling from a known posterior, each with their own limitations. In practice, another method is usually used: Markov Chain Monte Carlo, or MCMC, which allows us to sample even from unknown posteriors. All three methods yield the same results, but MCMC is the most flexible - it works with any model and any priors. Let's look at its two building blocks: Monte Carlo and Markov Chains.

3. Monte Carlo

Monte Carlo is a way to approximate a quantity by generating random numbers. Consider a circle with radius 5. From the formula, we know its area to be 78-point-5.

4. Monte Carlo

But how might we approximate the area without the formula? To start, we can define a 10-by-10 square on the circle.

5. Monte Carlo

Then, we generate random points inside the square. The more points, the more precise the approximation. Let's use only 25. 19 out of 25 points, or 76%, fell inside the circle, meaning the circle's area is roughly 76% of the square's area, or: 76. Not bad using random numbers!

6. Markov Chains

Now, to Markov Chains. They are models of a sequence of states, between which one transitions with given probabilities.

7. Markov Chains

Imagine a bear that only hunts, eats and sleeps. The table shows the probabilities of transitioning between these three states. From the first row we see that if the bear is hunting now, there is a 10% probability it will hunt or sleep next, and an 80% probability it will eat.

8. Markov Chains

Some Markov Chains have a property that, after transitioning between states many times, they will reach a so-called steady state. This means that no matter where the bear started, the probabilities for it to be in particular states in a distant future are the same.

9. Markov Chain Monte Carlo

Let's put it all together! There are many MCMC samplers, but all work in a similar fashion. To get posterior draws for a parameter, we start by generating a random point.

10. Markov Chain Monte Carlo

Then another one, close to the first - that's the Monte Carlo part, random generation. Then, we check how well this new point explains our data, or: what's the likelihood with this value of the parameter. Then, we either accept or reject this new point. The better it explains the data, the higher the probability that we accept it.

11. Markov Chain Monte Carlo

Here, the new point explains our data well, and we accept it, which I denote by green.

12. Markov Chain Monte Carlo

Next, we sample another point, close to the last accepted point.

13. Markov Chain Monte Carlo

However, it doesn't explain the data well enough, so we reject it, denoted by red.

14. Markov Chain Monte Carlo

But we accept the next one.

15. Markov Chain Monte Carlo

And the next three.

16. Markov Chain Monte Carlo

Eventually we have many accepted points. This generates a Markov Chain, and the probabilities of sampling specific values converge to the steady state, which is our posterior distribution. Finally, we discard some number of first draws, called burn-in, sampled before the Markov Chain has converged, since they are simply random. The remaining ones are our posterior draws.

17. Aggregated ads data

Here is the previous ads data. We want to predict num_clicks with a regression model using clothes_banners_shown and sneakers_banners_shown.

18. Linear regression with pyMC3

We start by defining the regression formula, consisting of the response, a tilde sign, and the predictors separated with pluses. We open the with statement with pymc3-dot-model instance as the model. Inside it, we define the model by calling pm-dot-glm-dot-from_formula with the formula and the data passed as arguments. GLM stands for generalized linear model, a class of models that includes linear regression. We could define the priors and the likelihood here, but the defaults are good for linear regression. We can print the model to see the priors for the parameters, and at the bottom, the normal likelihood for our target variable. Finally, we call pm-sample to generate 1000 valid draws, and 500 burn-in draws, set by the tune parameter. This output is conventionally called a trace.

19. Let's practice MCMC!

Let's practice!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.