Avoiding multicollinearity
Back to our sales dataset salesData which is already loaded in the workspace. Additionally, the package rms is loaded.
Let's estimate a multiple linear regression! Of course, we want to make use of all variables there are in the dataset.
Este exercício faz parte do curso
Machine Learning for Marketing Analytics in R
Instruções do exercício
- Go ahead and calculate a full model called
salesModel1using all variables but theidin order to explain the sales in this month. To do this, fill in the right variable names into the following dummy syntax:response ~ . - excluded_variable. This can be read as "responsemodeled by all variables exceptexcluded_variable." - Estimate the variance inflation factors using the
vif()function from thermspackage. - In addition to excluding the variable
id, remove the variablespreferredBrandandnBrandsin order to avoid multicollinearity. You do this by appending each of them with-. Store the model in an object calledsalesModel2. - Reestimate the variance inflation factors of the model. Would you accept the results now?
Exercício interativo prático
Experimente este exercício completando este código de exemplo.
# Estimating the full model
salesModel1 <- lm(salesThisMon ~ . - ___,
data = salesData)
# Checking variance inflation factors
vif(___)
# Estimating new model by removing information on brand
salesModel2 <- lm(salesThisMon ~ . - ___,
data = ___)
# Checking variance inflation factors
___