Compute VIF
As you learned in the video one of the most widely used diagnostic for multicollinearity is the variance inflation factor or VIF, which is computed for each explanatory variable.
Recall from the video that the rule of thumb threshold is VIF at the level of 2.5, meaning if the VIF is above 2.5 you should consider there is effect of multicollinearity on your fitted model.
The previously fitted model
and crab
dataset are preloaded in the workspace.
This is a part of the course
“Generalized Linear Models in Python”
Exercise instructions
- From
statsmodels
importvariance_inflation_factor
. - From
crab
dataset chooseweight
,width
andcolor
and save asX
. AddIntercept
column of ones toX
. - Using
pandas
functionDataFrame()
create an emptyvif
dataframe and add column names ofX
in columnVariables
. - For each variable compute VIF using the
variance_inflation_factor()
function and save invif
dataframe withVIF
column name.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Import functions
from statsmodels.stats.outliers_influence import ____
# Get variables for which to compute VIF and add intercept term
X = ____[[____, ____, ____]]
X[____] = 1
# Compute and view VIF
vif = pd.____
vif["variables"] = X.____
vif["VIF"] = [____(X.values, i) for i in range(X.shape[1])]
# View results using print
____(____)
This exercise is part of the course
Generalized Linear Models in Python
Extend your regression toolbox with the logistic and Poisson models and learn to train, understand, and validate them, as well as to make predictions.
In this final chapter you'll learn how to increase the complexity of your model by adding more than one explanatory variable. You'll practice with the problem of multicollinearity, and with treating categorical and interaction terms in your model.
Exercise 1: Multivariable logistic regressionExercise 2: Fit a multivariable logistic regressionExercise 3: The effect of multicollinearityExercise 4: Compute VIFExercise 5: Comparing modelsExercise 6: Checking model fitExercise 7: Compare two modelsExercise 8: Deviance and linear transformationExercise 9: Model formulaExercise 10: Model matrix for continuous variablesExercise 11: Variable transformationExercise 12: Coding categorical variablesExercise 13: Categorical and interaction termsExercise 14: Modeling with categorical variableExercise 15: Interaction termsExercise 16: Congratulations!What is DataCamp?
Learn the data skills you need online at your own pace—from non-coding essentials to data science and machine learning.