Exercise

# Multiple regression: model selection

In the video, Professor Conway talked about the \(R^2\) coefficients of regression models. These coefficients are often used in practice to select the best regression model in case of competing models. The **\(R^2\) coefficient of a regression model** is defined as the percentage of the variation in the outcome variable that can be explained by the predictor variables of the model. In general, the \(R^2\) coefficient of a model increases when more predictor variables are added to the model. After all, adding more predictor variables to the model tends to increase the odds of explaining more variation in the outcome variable.

Check this generality by comparing the \(R^2\) coefficient of a single regression model with that of a multiple regression model.

Instructions

**100 XP**

- First, perform a
**single regression**of the outcome variable`salary`

onto the predictor variable`years`

and save it in the variable`model_1`

. Secondly, perform a**multiple regression**of the outcome variable`salary`

onto predictor variables`years`

and`pubs`

simultaneously by using the`lm()`

function. Save the latter in the variable`model_2`

. Afterwards, make sure to check out the regression output by means of the`summary()`

function. - Save the
**\(R^2\) coefficients**of the two regression models into two preliminary variables. You will need them later! Extract these numbers from the regression output by means of the`r.squared()`

suffix in the same way that you would extract specific variables from a data set. Take two seconds to think this through. Check the hint if you are stuck. - Lastly, create an empty vector
`r_squared`

. Then,**round off**the \(R^2\) coefficients to three decimal places and save them in different columns of the vector. Print out the vector to see whether or not the \(R^2\) coefficient has increased due to an extra predictor variable that was added to the model.