Get startedGet started for free

Count data

1. Count data

Previously, we saw how to deal with binary data, specifically data that gave us ones or zeros. However, data often results from counts. Hence, data scientists often deal with count data.

2. Examples of count data

For example, we might have the number of events per hour, such as the number of new visitors to a website per hour. We might also have counts per area. For example, we might have the number of birds per grid cell or the number of cancer tumors per square centimeter. Count data differs from binomial data because it does not have an explicit maximum value.

3. Alternative to Chi-square test

Besides count data, we can also use glmer() as an alternative to a Chi-square test. Many introductory statistics courses cover the use of the Chi-square test to compare count data. Using a Poisson error term in a generalized linear model is an alternative. Broadly speaking, to use a Poisson error term instead of a Chi-square test, we estimate an intercept for each treatment group. We then can either examine each intercept estimate or use an ANOVA to examine if the terms are different than zero. We will go over this as an example during the exercises.

4. R Syntax for Poisson regression with `glmer`

Building a generalized linear model with a Poisson error term is relatively straightforward. We simply pass "Poisson" to the family argument. We can also build a glmer model using the same family.

5. Marketing click through case study

For our first case study, a marketer has redone a webpage for their client and compared how different test groups would click through to new pages. He knows that each group should be a random-effect because individuals within each group are not independent from one-another. During this exercise, we'll apply what we've learned to help him by fitting a generalized linear mixed effects regression with a Poisson distribution.

6. Chlamydia by age-group and county data

For our second case study, we will use data from the State of Illinois. The State of Illinois collects cases of sexually transmitted infections, such as chlamydia, by county and age group. This data is important for public health planning and allocation of resources. Also, outbreaks of infections can require additional actions. These cases might also correspond to changes in public policy, such as sexual education. Lastly, marketers or drug researchers might be interested in outbreaks of diseases so that they know where to target drug development or marketing.

7. Let's apply Poisson regression!

Now that we've seen two examples, let's dive into the data.