Get startedGet started for free

Poisson and quasipoisson regression to predict counts

1. Poisson and quasipoisson regression to predict counts

In this lesson, you will learn about regression to predict count data.

2. Predicting Counts

Predicting counts is a non-linear problem, because counts are restricted to being non-negative and integral. The counterpart to linear regression for count data is poisson or quasipoisson regression.

3. Poisson/Quasipoisson Regression

Poisson regression is also a generalized linear model. It assumes that the inputs are additive and linear with respect to the log of the outcome. In R, you will again use the glm function, with family either poisson or quasipoisson.

4. Poisson/Quasipoisson Regression

For Poisson regression, the outcome is an integer that represents a count or a rate, like the number of traffic tickets a driver gets in a year, or the number of visits to a website per day. The model returns an expected rate that is not necessarily an integer, for instance the expected number of visits to a website per day.

5. Poisson vs. Quasipoisson

Poisson regression assumes that the process producing the count has a poisson distribution, where the mean equals the variance. For many real-life processes, the variance will be quite different from the mean. In this case, you should use quasipoisson regression. Poisson and quasipoisson regression work better on larger datasets. If the counts that you want to predict are much larger than zero, doing regular regression will often be fine, as well. Let's see an example.

6. Example: Predicting Bike Rentals

Here we have hourly data from a bike sharing system in Washington, D.C., detailing the number of bikes rented during the first 2 weeks of January. We want a model to predict the hourly bike rental counts as a function of the time of day, the type of day (workday, weekend, or holiday), and details about the weather.

7. Fit the model

We’ll use January data for training. Because the variance of the bike rentals is much larger than the mean, we will use quasipoisson regression.

8. Check model fit

As with logistic regression, we can use pseudo-R-squared to check the goodness of fit. We can get the deviance and null deviance of the model using glance. This model explains about 76% of the deviance, not too bad.

9. Predicting from the model

The predict function takes the model and the data to prediction. As with logistic regression, be sure to use type = "response" to get the predicted rates. Here, we apply the model to data from February, and also plot the predicted hourly rates versus the actual counts.

10. Evaluate the model

We can calculate the root mean squared error of the model for February. On average, the prediction is off by about 69 rentals each hour, about half the standard deviation of the rental rates from the mean.

11. Compare Predictions and Actual Outcomes

We can also compare the predictions to actual outcomes as a function of time.

12. Let's practice!

Let’s now practice fitting and predicting from count models.