Eliminating variables from the model - adjusted R-squared selection
Now you will create a new model, where you will drop the variable that when dropped yields the highest improvement in the adjusted \(R^2\).
This exercise is part of the course
Data Analysis and Statistical Inference
Exercise instructions
- Create a new model,
m1
, where you removerank
from the list of explanatory variables. Check out the adjusted \(R^2\) of this new model and compare it to the adjusted \(R^2\) of the full model. - If you don't want to view the entire model output, but just the adjusted R-squared, use
summary(m1)$adj.r.squared
. - Create another new model,
m2
, where you remove ethnicity from the list of explanatory variables. Check out the adjusted \(R^2\) of this new model and compare it to the adjusted \(R^2\) of the full model. - Repeat until you have tried removing each variable from the full model
m_full
at a time, and determine the removal of which variable yields the highest improvement in the adjusted \(R^2\). - Make note of this variable (you will be asked about it in the next question).
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# The evals data frame is already loaded into the workspace
# The full model:
m_full <- lm(score ~ rank + ethnicity + gender + language + age + cls_perc_eval + cls_students +
cls_level + cls_profs + cls_credits + bty_avg, data = evals)
summary(m_full)$adj.r.squared
# Remove rank:
m1 <- lm(score ~ ethnicity + gender + language + age + cls_perc_eval + cls_students + cls_level +
cls_profs + cls_credits + bty_avg, data = evals)
summary(m1)$adj.r.squared
# Remove ethnicity:
m2 <-
summary(m2)$adj.r.squared
# Remove gender:
m3 <-
summary(m3)$adj.r.squared
# ...