1. Multivariate Bayesian regression
We now have the skills to build Bayesian regression models with one predictor. It's not so hard to generalize these to multivariate regression models with two or more predictors. Consider some motivation.
2. Modeling volume
Let $Y_i$ be rail-trail volume, or number of users, on a given day $i$.
3. Modeling volume by weekday
In previous exercises, you modeled $Y_i$ by $X_i$, a categorical indicator of weekday status. Your posterior analysis of this relationship indicated that volume tended to be higher on weekends than on weekdays.
4. Modeling volume by temperature
Weather might also explain some of the variability in trail volume from day to day. For example, let $Z_i$ be the high temperature on day $i$ (in degrees Fahrenheit). Rail-trail managers have noticed a positive linear association between $Y_i$ and $Z_i$ - volume tends to be higher on warmer days than on colder days.
5. Modeling volume by temperature & weekday
Luckily, if weekday status and temperature both explain some of the variability in trail volume, there's no need to pick just one of these predictors for our Bayesian model. Rather, we can build a multivariate model of volume $Y_i$ that incorporates both weekday status $X_i$ and temperature $Z_i$.
As usual, we'll start by assuming that daily volume is Normally distributed around some trend $m_i$ with residual standard deviation $s$.
The dependence of the trend on weekday status and temperature can be written as $m_i = a + bX_i + cZ_i$.
Consider the trend for weekends. In this case $X_i$ is 0 and $m_i$ simplifies to $a + cZ_i$.
In contrast, for weekdays, $X_i$ is 1 and $m_i$ reduces to $(a + b) + cZ_i$.
6. Modeling volume by temperature & weekday
These equations represent two separate linear trends between volume and temperature for weekends and weekdays.
7. Modeling volume by temperature & weekday
Let's use these to puzzle out the meaning of regression parameters $a$, $b$, and $c$.
First, notice that $a$ represents the weekend y-intercept whereas
$a + b$ represents the weekday y-intercept.
Thus $b$ represents the contrast between the intercepts. Equivalently, since the trend lines are parallel, $b$ is the vertical distance between the two trend lines at any given temperature. As such, $b$ measures the contrast in volume between weekdays and weekends of the same temperature.
Moving on, both trend lines have a slope of $c$. Thus, no matter the day of the week, $c$ measures the change in volume per 1 degree increase in temperature.
Finally, $s$ is the residual standard deviation of days from the trend.
8. Priors for $a$ and $b$
Consider the following vague priors for these regression parameters. First, the prior for $a$ reflects a lack of certainty about the y-intercept for the relationship between temperature and weekend volume.
The prior for $b$ also spans a large range of positive and negative values, thus reflects a lack of certainty about how typical volume compares on weekdays vs weekends of similar temperature. It could be more, it could be less.
9. Priors for $c$ and $s$
Further, whether on weekdays or weekends, we lack certainty about the association between trail volume & temperature (that is, the slope of the model lines).
Finally, we assume that the residual standard deviation is equally likely to be anywhere between 0 and 200 users.
10. Bayesian model of volume by weekday status
The multivariate Bayesian regression model of trail volume by weekday status and temperature is summarized here.
11. DEFINE the Bayesian model in RJAGS
In the next exercises you'll simulate the corresponding posterior in RJAGS. Though the multivariate setting is new, it doesn't much complicate our code. It simply adds an extra term to the definition of trend `m[i]` and an extra parameter, thus extra prior, to the model.
12. Let's practice!
Time to prove to yourself that working with multivariate models in RJAGS is refreshingly straightforward.