1. Learn
  2. /
  3. Courses
  4. /
  5. Practicing Machine Learning Interview Questions in Python

Connected

Exercise

Filter and wrapper methods

Questions about reducing the dimensionality of a dataset are highly common in machine learning interviews. One way to reduce the dimensionality of a dataset is by only selecting relevant features in your dataset.

Here you'll practice a filter method on the diabetes DataFrame followed by 2 different styles of wrapper methods that include cross-validation. You will be using pandas, matplotlib.pyplot and seaborn to visualize correlations, process your data and apply feature selection techniques to your dataset.

The feature matrix with the dropped target variable column (progression) is loaded as X, while the target variable is loaded as y.

Note that pandas, matplotlib.pyplot, and seaborn have already been imported to your workspace and aliased as pd, plt, and sns respectively.

Notice you've added a Cross-validate step to your pipeline (which applies to the last 3 steps):

Machine learning pipeline

Instructions 1/3

undefined XP
  • 1
    • Create correlation matrix with diabetes and a heatmap, then subset the features which have greater than 50% correlation.
  • 2
    • Instantiate a linear kernel SVR estimator and a feature selector with 5 cross-validations, fit to features and target.
  • 3
    • Drop the unimportant column found in step 2 from X and instantiate a LarsCV object and fit it to your data.