More complex modeling
1. More complex modeling
Now that we can build a simple linear regression model where simple means that we have one quantitative predictor variable, let's look a multiple linear regression model. This is when we include more than one predictor variable and maybe some of the predictors are categorical variables.2. Multiple linear regression
For example, how do we get the regression line that is associated with the curves in this plot?3. Multiple linear regression
In this case, we want to build a multiple linear regression model. We modify our previous equation to reflect the fact that we now have not one but p predictor variables. In the case of the babies data, we have two predictors, AgeMonths and Gender.4. Multiple linear regression
But we can't add Gender into the equation directly because it isn't numerical. Therefore, we need to transform the variable to equal 1 for male and 0 for female.5. Multiple linear regression
I have done so here using the case_when() function where on the left hand side of the tilde you specify the case and the right hand side it equals the value for the new variable Gender2.6. Multiple linear regression
Now our regression equation has two x's: x1 for age and x2 for gender. For males, x2 equals 1 and so their intercept is B0 plus B2 with a slope of B1. And for the females, x2 equals 0 so their intercept is only B0.7. Multiple linear regression
This model can be fit using the svyglm() function where we add the new predictor, Gender, to the formula. Notice I can supply the factor variable and R converts it to ones and zeros in the background! And, now the coefficient table from running summary() contains a new row for Gendermale and provides estimates of B0, B1, and B2. However, the hypothesis tests associated with the coefficients table have now changed.8. Multiple linear regression
Now the hypotheses ask whether or not that variable should be included in the model that contains the other predictors. For example, gendermale, we want to know if it should be added to a model that contains age. And, as we saw in simple linear regression, the test statistic, which follows a t-distribution, is the estimated coefficient over its standard error.9. Multiple linear regression
For AgeMonths, we want to know if it should be added to a model that contains Gender. Both tests have a small p-value, indicating that both variables provide useful information to the model. As before, the hypothesis test associated with the intercept row isn't really of interest.10. Multiple linear regression
We fit a multiple linear regression model that forced the slope for both genders to be equal to B1. That seems reasonable for this model based on our scatter plot. But what if we wanted the slopes to be different? How does that impact the form of the regression model?11. Let's practice!
You will practice building that particular model in the exercises!Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.