ggplot2 and binomial regression

1. ggplot2 and binomial GLM

During this video, you'll learn about plotting binomial GLMs with ggplot2. The method is similar to plotting a Poisson GLM but includes some slight differences.

2. What can I see in my data?

For example, with our commuter data, we might want to look at a new variable and examine how commuter distance changes the probability of riding the bus. I would plot this using ggplot with the bus data frame from before. The y aesthetic would be bus. The x aesthetic would be "MileOneWay". I would use this with "geom_point". However, notice how the points are on top of one-another.

3. geom_jitter()

Similar to your experience with the Poisson data, you will often have all of your points overlapping with binomial data as well. One solution to fix this is to jitter your points using "geom_jitter" rather than "geom_point". Here I only jitter the height and not the width.

4. geom_smooth()

I often use geom_smooth to look at data. However, when we run it on our data, we have a problem. ggplot2 does not like that our data is categorical. Thus, we will need to change our data from a factor to numeric.

5. factor to numeric

First, I would look at the structure of my data to make sure it is a factor. Second, I convert the factor to numeric and subtract one so that my levels are now 0 and 1.

6. geom_smooth()

Now that we've created a numeric bus variable, let's try the smooth again. This time, we get different results: a weird smoothed curve that goes below zero. Looks like we will need to change another setting.

7. linear models

Let's try using method equals glm. We'll first just plot the default glm(). Now, however, the results still do not look correct. ggplot2 has fit a linear model to our data because this is the default glm option.

8. Logistic regressions

If we use the methods-dot-args option, we can specify the inputs of glm with a list. Here, we can change family to equal binomial. Now, we have a reasonable plot of our data. Other courses on ggplot2 at DataCamp can help you learn how to clean up this figure to be publication or presentation quality.

9. Logit vs probit

We can also use this option to compare probits and logits by using the binomial function with the appropriate link function. If we set SE to false, then the confidence intervals go away and we can look at the lines better. I also changed the line colors so the two lines are different. Notice that the results are similar, highlighting how probit and logit usually produce the same results.

10. Summary of steps

In summary, you can plot a binomial regression by jittering the points to avoid overlap. You add a smoothed line with geom_smooth. And, you need to specify the correct model and family. Last, you'll want to polish your figure, something covered in other DataCamp courses.

11. Let's practice!

Now, it's your turn to plot a binomial regression.