Get startedGet started for free

Supervised feature selection

1. Supervised feature selection

Welcome back. Up to now, we have dealt with unsupervised feature selection methods.

2. Unsupervised vs. supervised feature selection

There are two major approaches to feature selection —

3. Unsupervised vs. supervised feature selection

Unsupervised and supervised. The differences are much like the differences between supervised and unsupervised machine learning techniques. In short, it comes down to whether the target variable values are used in the process or not.

4. Unsupervised vs. supervised feature selection

The methods we discussed thus far are unsupervised methods for feature selection, like dropping features with missing values, low variance, and correlation. These approaches rely solely on the information in the individual features. Supervised feature selection is more directly related to selecting features based on model performance or how much information the features provide about the target variable in the model.

5. Supervised feature selection explained

In other words, supervised feature selection uses the information that each predictor provides about the target variable to identify and keep the most important features. The selection process is based on some feature importance criteria like information gain or regression coefficient magnitude. Remember that information gain in decision trees is defined as the entropy reduction a predictor variable provides about the target variable. That was a form of supervised feature selection.

6. Unsupervised vs. supervised feature selection

Another example of supervised feature selection is recursive feature elimination. Recursive feature elimination begins by fitting a model with all features, ranks feature importance, removes the weakest features, and then refits the model. It repeats this process until the desired number of features is left in the model.

7. Unsupervised vs. supervised feature selection

Another supervised feature selection method is lasso regression. It penalizes regression coefficients, shrinking them toward zero. Less important features are reduced to zero and eliminated from the model. We'll be learning more about lasso regression later.

8. Unsupervised vs. supervised feature selection

Finally, random forest models also naturally perform supervised feature selection. Random forest is an ensemble approach where many decision trees are randomly created from the predictors. The trees that perform better contain the most important features. In this way, it can select the best features. We will also dive deeper into random forest models later on.

9. Let's practice!

For now, let's practice.