Get startedGet started for free

Regression: feature selection

1. Regression: feature selection

Welcome back! In the last chapter, you covered pre-processing techniques such as what to do with missing values, data transformations, normalizing and scaling. In this chapter, we're going to get into Supervised learning methods starting with how to best select features for a regression model.

2. Selecting the correct features:

The motivation for selecting the best subset of features is that it reduces overfitting by removing unimportant features that contribute noise but not information. The second reason is that it improves accuracy since any potentially misleading data is removed. Next, because the model is less complex, it is more interpretable. Last, but certainly not least, less data means that the machine learning algorithm takes less time to train.

3. Feature selection methods

There are 4 main types of feature selection methods. The first is filter methods which rank features based on their statistical performance between the independent variables and the target variable. Wrapper methods use a machine learning model to evaluate performance. Embedded methods perform an iterative model training process to extract features and feature importance is offered by a few of the tree-based ML models in scitkit-learn.

4. Compare and contrast methods

While filter methods are the only technique that doesn't use a machine learning model, but rather correlation, this fact means that the best subset may not always be selected, but prevents overfitting. However the model-based methods, as you can see, have the advantage of selecting the best subset. Depending on the parameters, however, this can lead to overfitting.

5. Correlation coefficient statistical tests

The statistical tests for the filter method depend on the data type of the feature and response. In the exercises you'll get to practice creating a Pearson's correlation matrix using the diabetes dataset. This gives the numerical relationship between all the features so that a threshold can be applied as a filter, thus the name filter method.

6. Filter functions

You'll practice with the dot corr function and use the returned matrix to generate a heatmap using the function of the same name from seaborn. You'll also see the absolute value function used to return the features greater than a given threshold since correlations can be negative or positive.

7. Wrapper methods

The wrapper methods themselves include forward selection, which sequentially adds features one at a time based on their model contribution, backward elimination, which starts with all of the features sequentially dropping features based on the least contribution in a given step, a combination of the two also called bidirectional elimination, and recursive feature elimination. You'll get to practice forward selection using what is called least angle regression or lars for short and cross-validated recursive feature elimination with the RFECV function from sklearn.

8. Embedded methods

The embedded methods include lasso and ridge regression, and elasticnet which is a hybrid of lasso and ridge. They perform an iterative process which extracts the features that contribute the most during a given iteration to return the best subset dependent on the penalty parameter alpha. I won't say more here as lesson 2 in this chapter is solely devoted to the study of these regularization methods.

9. Tree-based feature importance methods

And finally, the tree-based machine learning methods, which have built-in feature importance are the random forest and extra trees algorithms from scikit-learn. They are accessed with the feature underscore importances underscore attribute on the model object after fitting.

10. Additional functions

Some additional functions you'll use in the exercises that follow, all from sklearn, are svm.SVR support vector regression estimator, feature_selection.RFECV recursive feature elimination with cross-validation which, after model fitting have attributes to access a boolean array of selected features with dot support underscore and feature ranking where selected equals 1 with dot ranking underscore. From linear_model.LinearRegression we get a linear model estimator and linear_model.LarsCV a least angle regression with cross-validation which, after model fitting also have attributes to access the r-squared score with .score and the estimated regularization parameter alpha with alpha underscore.

11. Let's practice!

Alright, time for you to go and select some features!