Bayesian model comparisons

1. Bayesian model comparisons

So far we've looked at how to determine if a single model fits our observed data. But what do we do if we have multiple models, and we want to figure out which one fits the best? This is known as model comparison. In this lesson, we'll talk about how we can compare two, or more, regression models that were estimated using rstanarm.

2. The loo package

When we estimate a model with rstanarm, we can use the loo package for model comparisons. LOO stands for "leave-one-out" which is one variety of a class of model comparison tools more widely known as cross validation. The loo package doesn't use true cross validation. Instead, the algorithm approximates the leave-one-out cross validation. The details of how exactly this is done are beyond the scope of this course, but if you want to learn more, the loo package documentation provides many resources for exploration. In this lesson, we'll instead focus on how the loo package can be used to compare models, and interpret the output.

3. Using loo on a single model

To view the LOO estimates for a model, we simply send an estimated model as the argument to the loo() function. This gives us several pieces of information. First, we see that the estimates were computed from a log-likelihood matrix that has 4000 rows (the total number of iterations in our posterior) and 434 columns (the total number of observations in the kidiq dataset). We get the LOO estimate, called elpd_loo, the effective number of parameters in the model, p_loo, and the LOO estimate converted to a deviance scale, looic. The deviance scale is simply minus two times the LOO estimate, but is more common in some fields. Finally, we're provided with some diagnostics from the approximation algorithm. However, these numbers aren't very useful in isolation. For example, what does an epld_loo values of minus 1878-point-5 mean? This value really only has meaning relative to the values of competing models.

4. Model comparisons with loo

Let's say that we have two possible models. In the first model, we predict a kid's IQ scores from only their mother's IQ. In the second model, we predict the kid's IQ score from not only their mom's IQ, but also whether or not their mom graduated high school, and the interaction between mom's IQ and mom's high school completion. We want to know which model does a better job of predicting the kid's IQ score. To do this, we can save the loo estimates from the 1 predictor and 2 predictor models, and then use the `compare` function.

5. Model comparisons with loo

The compare function provides use with the difference in loo estimates, along with a standard error of the difference. A positive difference score like in this example, means that the second model is favored (the model with both predictors), whereas a negative score would indicate a preference for the first model. The standard error helps us decide if the difference in meaningful. As a rule of thumb, if absolute value of the difference is less than the standard error, the models don't perform differently, and we should choose the simpler model (the one with fewer parameters) because it is more parsimonious. If the absolute value of the difference is greater than the standard error, then we will prefer the model indicated by the test. In this example, because the difference of 6.1 is positive and greater than the standard error of 3.9, we would choose the second model with both predictors.

6. Let's practice!

Now it's your turn to compare models.

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.