Avoiding multicollinearity
Back to our sales dataset salesData
which is already loaded in the workspace. Additionally, the package rms
is loaded.
Let's estimate a multiple linear regression! Of course, we want to make use of all variables there are in the dataset.
Este exercício faz parte do curso
Machine Learning for Marketing Analytics in R
Instruções do exercício
- Go ahead and calculate a full model called
salesModel1
using all variables but theid
in order to explain the sales in this month. To do this, fill in the right variable names into the following dummy syntax:response ~ . - excluded_variable
. This can be read as "response
modeled by all variables exceptexcluded_variable
." - Estimate the variance inflation factors using the
vif()
function from therms
package. - In addition to excluding the variable
id
, remove the variablespreferredBrand
andnBrands
in order to avoid multicollinearity. You do this by appending each of them with-
. Store the model in an object calledsalesModel2
. - Reestimate the variance inflation factors of the model. Would you accept the results now?
Exercício interativo prático
Experimente este exercício completando este código de exemplo.
# Estimating the full model
salesModel1 <- lm(salesThisMon ~ . - ___,
data = salesData)
# Checking variance inflation factors
vif(___)
# Estimating new model by removing information on brand
salesModel2 <- lm(salesThisMon ~ . - ___,
data = ___)
# Checking variance inflation factors
___