Filter and wrapper methods

Questions about reducing the dimensionality of a dataset are highly common in machine learning interviews. One way to reduce the dimensionality of a dataset is by only selecting relevant features in your dataset.

Here you'll practice a filter method on the diabetes DataFrame followed by 2 different styles of wrapper methods that include cross-validation. You will be using pandas, matplotlib.pyplot and seaborn to visualize correlations, process your data and apply feature selection techniques to your dataset.

The feature matrix with the dropped target variable column (progression) is loaded as X, while the target variable is loaded as y.

Note that pandas, matplotlib.pyplot, and seaborn have already been imported to your workspace and aliased as pd, plt, and sns respectively.

Notice you've added a Cross-validate step to your pipeline (which applies to the last 3 steps):

Machine learning pipeline

This exercise is part of the course

Practicing Machine Learning Interview Questions in Python

View Course

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Create correlation matrix and print it
cor = ____.____()
print(____)

# Correlation matrix heatmap
plt.figure()
sns.____(____, annot=True, cmap=plt.cm.Reds)
plt.show()

# Correlation with output variable
cor_target = abs(cor["progression"])

# Selecting highly correlated features
best_features = ____[____ > ____]
print(____)

Edit and Run Code