Multicollinearity techniques - PCA

In the last exercise you used feature engineering to combine the s1 and s2 independent variables as s1_s2 since they displayed the highest correlation in the diabetes dataset.

In this exercise, you'll perform PCA on diabetes to remove multicollinearity before you apply Linear Regression to it. Then, you'll compare the output metrics to those from the last exercise. Finally, you'll visualize what the correlation matrix and heatmap of the dataset looks like since PCA completely removes multicollinearity.

1
- Import the necessary modules to perform PCA.
- Instantiate and fit.
- Transform train and test separately.

2
- Instantiate, fit, and predict a Linear Regression to PCA transformed dataset.
- Print the model coefficients, MSE, and r-squared.
3
- Create a correlation matrix, plot it to a heatmap.
- Print the matrix to explore the independent variable relationships.

Data Pre-processing and Visualization

Supervised Learning

Unsupervised Learning

Model Selection and Evaluation

Exercise

Multicollinearity techniques - PCA

Instructions 1/3