Get startedGet started for free

Multicollinearity techniques - feature engineering

Multicollinearity is a common issue that might affect your performance in any machine learning context. Knowing how to discuss this small detail could take your explanation of modeling from good to great and really set you apart in an interview.

In this exercise, you'll practice creating a baseline model using Linear Regression on the diabetes dataset and explore some of the output metrics. Then you'll practice techniques to visually explore the correlation between the independent variables before finally perform feature engineering on 2 variables that are highly correlated.

For the first two steps, use X_train, X_test, y_train, and y_test which have been imported to your workspace.

Additionally, all relevant packages have been imported for you: pandas as pd, train_test_split from sklearn.model_selection, LinearRegression from sklearn.linear_model, mean_squared_error and r2_score from sklearn.metrics, matplotlib.pyplot as plt and seaborn as sns.

This exercise is part of the course

Practicing Machine Learning Interview Questions in Python

View Course

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Instantiate, fit, predict
lin_mod = ____()
lin_mod.____(____, ____)
y_pred = lin_mod.____(____)

# Coefficient estimates
print('Coefficients: \n', lin_mod.____)

# Mean squared error
print("Mean squared error: %.2f"
      % ____(____, ____))

# Explained variance score
print('R_squared score: %.2f' % ____(____, ____))
Edit and Run Code