1. Poisson regression
The Bayesian regression models we've explored thus far have one thing in common: they utilize a Normal likelihood model. Luckily, it's quite easy to generalize these regression techniques to non-Normal settings. Here we'll focus on just one generalized linear model technique, Poisson regression.
2. Normal likelihood structure
Our running example of rail-trail volume provides some motivation. Previously, we assumed that volume $Y$ varies Normally from day to day. The roughly bell-shaped histogram of the observed volume data suggests that this assumption isn't too unreasonable. But technically, there are some flaws.
Mainly, the Normal model assumes that $Y$ has a continuous scale and can take on any value in the real line, including negative numbers.
BUT $Y$, the number of users on a given day, is a discrete count and cannot be negative.
3. The Poisson model
The Poisson model offers an alternative.
It appropriately assumes that $Y$ is the number of independent events that occur in a fixed interval (here the number of trail users on a given day - 0 or 1 or 2 etc).
Further, the single rate parameter $l$ represents the typical number of events per time interval (here the typical number of users per day - a positive number).
The histogram here summarizes the distribution of 10,000 draws from a Poisson model with a rate parameter of 2. The majority of these draws are between 0 and 3. None exceed 10.
4. The Poisson model
Increasing the rate at which events occur to say, 5, increases the typical value of events $Y$ to around 5 as well.
5. The Poisson model
This trend continues as we increase the rate parameter from 5 to 10 events per interval...
6. The Poisson model
and from 10 to 20 events per interval.
7. Poisson regression
Let's reconsider our regression model of trail volume $Y$ by weekday status $X$ and temperature $Z$ using a Poisson, instead of Normal, likelihood model.
8. Poisson regression
A natural place to start is to define the rate parameter $l$, the typical volume, as the linear combination of $X$ and $Z$.
9. Poisson regression
But there's a problem with linking $l$ directly to the linear model: it assumes trail volume can be negative. Thus, this strategy doesn't preserve the properties of the Poisson model.
10. Poisson regression
Alternatively, we can use a log link function to link rate parameter $l$ to the linear model. That is, we can model the log trend in volume by the linear combination of $X$ and $Z$. In turn, $l$ is the exponentiation of this model.
11. Poisson regression
It's also guaranteed to be positive, thus preserves the Poisson properties.
12. Poisson regression in RJAGS
To complete the Bayesian Poisson regression model, we place priors on the parameters $a$, $b$, and $c$. On the log scale, these priors are still quite vague.
Defining this model in RJAGS requires the familiar skeleton...
13. Poisson regression in RJAGS
and the definition of the Normal priors is no different from our previous model.
14. Poisson regression in RJAGS
The only new syntax is in the likelihood definition. Here we use `dpois()` to specify a Poisson model for `Y[i]` with rate `l[i]`.
15. Poisson regression in RJAGS
Subsequently, we specify that the log transformation of `l[i]` is defined by the linear combination of `X[i]` and `Z[i]`.
16. Caveats
Of course, no model is perfect. Just as there were caveats for Normal regression, there are caveats for Poisson regression.
Mainly, the Poisson model assumes that among days with similar temperatures and weekday status, the variance in volume is equal to the *mean* volume.
However, our sample trail data demonstrate potential overdispersion - the variance in observed volume is larger than the mean volume.
Though we could define a new model that accommodates this overdispersion, we won't. Our imperfect model is an OK place to start.
17. Let's practice!
It's your turn. Enjoy one last set of RJAGS exercises!