Get startedGet started for free

Bayesian regression in RJAGS

1. Bayesian regression in RJAGS

Now that we've engineered the structure of a Bayesian regression model, let's simulate the posterior model in RJAGS.

2. Bayesian regression model

Recall our interest in modeling adult weights $Y_i$ by heights $X_i$. We assumed that weight is linearly related to height, yet individuals deviate from the trend. Specifically, heights $Y_i$ are Normally distributed around trend $m_i$ with residual standard deviation $s$. We utilized our prior knowledge of this biological relationship to engineer priors for the three regression parameters: $a$, $b$, and $s$.

3. Prior insight

These prior insights reflect uncertainty about the $y$-intercept $a$, relative certainty that there's a positive slope $b$ (thus a positive association between weight and height), and vague information that the residual standard deviation is equally likely to be anywhere between 0 and 20 kg.

4. Insight from the observed weight & height data

The observed `bdims` data on 507 adults also provide insight into the regression parameters. Obtained by the `lm()` function, the line of best fit for these sample data has a $y$-intercept of -105.01 kg and a slope of 1.02 kg/cm. Further, the observed residual standard deviation was 9.3 kg.

5. DEFINE the regression model

With the prior and data insights in place, we're ready to define, compile, and simulate the posterior regression model in RJAGS. First, we'll define the model string `weight_model`.

6. DEFINE the regression model

Starting with the definition of the likelihood structure, we use a `for(i in 1:length(Y))` statement to indicate that we observed weights $Y_i$ for a collection of subjects, here 507 of them.

7. DEFINE the regression model

Inside the for loop, we specify that weights are Normally distributed around trend `m[i]` with standard deviation `s` (or precision `s^2` inverse). The square bracketing of `m[i]` in the RJAGS code indicates that the trend is subject dependent.

8. DEFINE the regression model

Earlier, we identified a linear trend between weight and height, thus $m_i = a + bX_i$. Note that this trend is completely deterministic - $m_i$ is defined by parameters $a$ and $b$, and height $X_i$. Thus in RJAGS, we define `m[i]` using an assignment operator, not a model tilde.

9. DEFINE the regression model

To complete the model definition, we specify the prior models for parameters $a$, $b$, and $s$. There's nothing new here.

10. COMPILE the regression model

Next, to *compile* this model using `jags.model()`, we provide a `textConnection()` to the defined `weight_model` string and a list of our observed data. There are two variables in our model, weight `Y` and height `X`, thus we must supply data for both. Here the 507 pairs of weight and height data are stored in `bdims`.

11. SIMULATE the regression model

Finally, we *simulate* a sample of size 10,000 from the posterior by applying `coda.samples()` to the compiled `weight_jags` model and by specifying our three parameters of interest.

12. SIMULATE the regression model

The results aren't great. After 10,000 iterations, the `a` and `b` chains don't appear to have stabilized. In fact, we see a negative correlation between these chains: as `b` values increase, `a` values decrease. This is somewhat expected. If we change the slope of a regression line, the intercept will change in accommodation.

13. Addressing Markov chain instability

There are a couple strategies for improving our Markov chain stability. First, we might standardize the height predictor by subtracting the mean and dividing by the standard deviation. This is outside the scope of our current discussion, thus we'll do something lazier - simply increase chain length.

14. SIMULATE the regression model

Luckily, after 100,000 iterations, the chains do appear to stabilize.

15. Posterior insights

Examining the density plots of the posterior chains (in dark gray) relative to the priors (in teal), notice that we have far more certainty about the parameters. Further, due to the large sample size of 507 subjects in the `bdims` data, the posterior models are nearly indistinguishable from the likelihoods. Thus the latter are omitted here.

16. Let's practice!

Now it's your turn to simulate and explore the regression model using RJAGS.