Exercise

# Creating dummy variables (2)

In order to include a categorical variable in a regression, the variable needs to be converted into a numeric variable by the means of a dummy variable. Previously, dummy variables have been generated using the intuitive, but less general `dummy.code()`

function from the `psych`

library.

From this point onwards the contrast `C()`

function is used to create dummy variables. Do not confuse this function with the `c()`

function that is used to combine values in a vector or list. The contrast `C()`

takes a categorical variable as a first argument and the treatment as a second argument. The latter tells R to rank all levels alphabetically and to take the first category as the reference group.

This exercise will illustrate the inclusion of the categorical variable `dept`

in a multiple regression. The code on the right estimates the regression **without categorical variable**. The `summary()`

function is used to get the summary of the regression results of `model`

and the `confint()`

function is used to create the confidence intervals.

Instructions

**100 XP**

- Construct a
**dummy variable**for`dept`

,`dept_code`

, using the`C()`

function. - Set up the regression
**with categorical variable**: regress salary on years, publications and department using the`lm()`

function and assign the name`model_dummy`

to the regression. Use the`dept_code`

variable to incorporate the categorical variable. Again, provide some summary statistics and confidence intervals using respectively the`summary()`

function and the`confint()`

function.