1. Supervised feature selection
Welcome back. Up to now, we have dealt with unsupervised feature selection methods.
2. Unsupervised vs. supervised feature selection
There are two major approaches to feature selection —
3. Unsupervised vs. supervised feature selection
Unsupervised and supervised. The differences are much like the differences between supervised and unsupervised machine learning techniques. In short, it comes down to whether the target variable values are used in the process or not.
4. Unsupervised vs. supervised feature selection
The methods we discussed thus far are unsupervised methods for feature selection, like dropping features with missing values, low variance, and correlation. These approaches rely solely on the information in the individual features. Supervised feature selection is more directly related to selecting features based on model performance or how much information the features provide about the target variable in the model.
5. Supervised feature selection explained
In other words, supervised feature selection uses the information that each predictor provides about the target variable to identify and keep the most important features. The selection process is based on some feature importance criteria like information gain or regression coefficient magnitude.
Remember that information gain in decision trees is defined as the entropy reduction a predictor variable provides about the target variable. That was a form of supervised feature selection.
6. Unsupervised vs. supervised feature selection
Another example of supervised feature selection is recursive feature elimination. Recursive feature elimination begins by fitting a model with all features, ranks feature importance, removes the weakest features, and then refits the model. It repeats this process until the desired number of features is left in the model.
7. Unsupervised vs. supervised feature selection
Another supervised feature selection method is lasso regression. It penalizes regression coefficients, shrinking them toward zero. Less important features are reduced to zero and eliminated from the model. We'll be learning more about lasso regression later.
8. Unsupervised vs. supervised feature selection
Finally, random forest models also naturally perform supervised feature selection. Random forest is an ensemble approach where many decision trees are randomly created from the predictors. The trees that perform better contain the most important features. In this way, it can select the best features. We will also dive deeper into random forest models later on.
9. Let's practice!
For now, let's practice.