Multicollinearity techniques - PCA
In the last exercise you used feature engineering to combine the s1
and s2
independent variables as s1_s2
since they displayed the highest correlation in the diabetes
dataset.
In this exercise, you'll perform PCA on diabetes
to remove multicollinearity before you apply Linear Regression to it. Then, you'll compare the output metrics to those from the last exercise. Finally, you'll visualize what the correlation matrix and heatmap of the dataset looks like since PCA completely removes multicollinearity.
Este ejercicio forma parte del curso
Practicing Machine Learning Interview Questions in Python
Ejercicio interactivo práctico
Prueba este ejercicio completando el código de muestra.
# Import
from sklearn.decomposition import ____
# Instantiate
pca = ____()
# Fit on train
pca.____(____)
# Transform train and test
X_trainPCA = pca.____(____)
X_testPCA = pca.____(____)