MulaiMulai sekarang secara gratis

Avoiding multicollinearity

Back to our sales dataset salesData which is already loaded in the workspace. Additionally, the package rms is loaded.

Let's estimate a multiple linear regression! Of course, we want to make use of all variables there are in the dataset.

Latihan ini adalah bagian dari kursus

Machine Learning for Marketing Analytics in R

Lihat Kursus

Petunjuk latihan

  • Go ahead and calculate a full model called salesModel1 using all variables but the id in order to explain the sales in this month. To do this, fill in the right variable names into the following dummy syntax: response ~ . - excluded_variable. This can be read as "response modeled by all variables except excluded_variable."
  • Estimate the variance inflation factors using the vif() function from the rms package.
  • In addition to excluding the variable id, remove the variables preferredBrand and nBrands in order to avoid multicollinearity. You do this by appending each of them with -. Store the model in an object called salesModel2.
  • Reestimate the variance inflation factors of the model. Would you accept the results now?

Latihan interaktif praktis

Cobalah latihan ini dengan menyelesaikan kode contoh berikut.

# Estimating the full model
salesModel1 <- lm(salesThisMon ~ . - ___, 
                 data = salesData)

# Checking variance inflation factors
vif(___)

# Estimating new model by removing information on brand
salesModel2 <- lm(salesThisMon ~ . - ___, 
                 data = ___)

# Checking variance inflation factors
___
Edit dan Jalankan Kode