Multiple logistic regression

1. Multiple logistic regression

Welcome back to the final chapter in the DataCamp course on GLMs.

2. Chapter overview

During this chapter, you will learn about multiple logistic regression, the power of the formula for multiple regression, and the assumptions of generalized multiple regression. Now, let's start by looking at multiple logistic regression.

3. Why multiple regression?

Why do we need multiple regression? Previously, you learned about generalized linear models with one intercept and one slope. But, what if we have multiple predictors? Rather than choosing, we can use multiple predictors with multiple regression.

4. Multiple predictor variables

Simple linear models and GLMs are limited to 1 slope and 1 intercept, which has a formula like this one, which has an intercept beta0 and slope beta1, as well as an error term. In contrast multiple regression allows us to have multiple slopes and intercepts. For example, we can have a global intercept beta0, and other slopes or intercepts using other betas.

5. Too much of a good thing

Like most things in life, we can have too much of a good thing. In this case, we can have too many predictor variables. According to Linear Algebra theory, the maximum number of coefficients is the number of observations. However, if we have the same number of predictor variables as response variables we will get a perfect fit between our data and model. Almost certainly, our model will not work well with other datasets. The term for this problem is "over-fitting". To avoid over-fitting, we usually want at least 10 observations for each coefficient.

6. Bus data: Two possible predictors

In the previous chapters, we explored 2 variables with the bus data. The number of days a commuter commutes and the distance of the commuter's commute. However, each time we build two models. But, we can build a model with both! We use the glm function and include both predictor variables separated by a plus sign.

7. Summary of GLM with multiple predictors

Running the summary of this model gives us output similar to what we've seen before. However, we now have an intercept and two coefficient estimates. In this case, both slopes are significantly different than zero. Commute days is positive so more commuting days increases the chance of riding the bus. Conversely, miles one ways is negative so commuting farther decreasing the chance of riding the bus. During the exercise, you will be able to compare single predictor models to a multiple predictor model with this data.

8. Correlation between predictors

The relationship between both predictors is important. As a visual refresher, the theoretical variables x1 and x2 are correlated on the left. That is to say as x1 increases x2 increases. Conversely, in the figure on the right, the variables are not correlated.

9. Order of coefficients

If the predictors are not correlated, formula order will not be important. For example, the formula x1 plus x2 will produce similar coefficient estimates as x2 + x1. Conversely, if the predictors are correlated, then changing the order may produce different coefficient estimates. Thus, the formula x1 + x2 may produce different results than x2 + x1.

10. Let's practice!

Now that you've learned about multiple regression, you'll get a chance to code them for yourself in R.

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.