Dealing with multicollinearity
In the previous exercise, you found that multicollinearity exists in your model by reviewing the VIF values of independent variables. Follow the steps below to remove multicollinearity:
- Step 1: Calculate VIF of the model
- Step 2: Identify if any variable has VIF greater than or equal to 5
- Step 2a: Remove the variable from the model if it has a VIF greater than or equal to 5
- Step 2b: If there are multiple variables with VIF greater than 5, only remove the variable with the highest VIF
- Step 3: Repeat steps 1 and 2 until VIF of all variables is less than 5
This exercise is part of the course
HR Analytics: Predicting Employee Churn in R
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Remove level
model_1 <- glm(turnover ~ . - ___, family = "binomial",
data = train_set_multi)
# Check multicollinearity again
___
# Which variable has the highest VIF value?
highest <- ___