Combining feature selectors

1. Combining feature selectors

In the previous lesson we saw how Lasso models allow you to tweak the strength of regularization with the alpha parameter.

2. Lasso regressor

We manually set this alpha parameter to find a balance between removing as much features as possible and model accuracy. However, manually finding a good alpha value can be tedious. Good news is, there is a way to automate this.

3. LassoCV regressor

The LassoCV() class will use cross validation to try out different alpha settings and select the best one. When we fit this model to our training data it will get an alpha_ attribute with the optimal value.

4. LassoCV regressor

To actually remove the features to which the Lasso regressor assigned a zero coefficient, we once again create a mask with True values for all non-zero coefficients. We can then apply it to our feature dataset X with the loc method.

5. Taking a step back

Two lessons ago we talked about random forests, they're an example of an ensemble model. It's designed on the idea that a lot of weak predictors can combine to form a strong one. When we use models to perform feature selection we could apply the same idea. Instead of trusting a single model to tell us which features are important we could have multiple models each cast their vote on whether we should keep a feature or not. We could then combine the votes to make a decision.

6. Feature selection with LassoCV

To do so lets first train the models one by one. We'll be predicting BMI in the ANSUR dataset just like you did in the last exercises. If we use LassoCV() we'll get an R squared of 99% and when we create a mask that tells us whether a feature has a coefficient different from 0 we can see that this is the case for 66 out of 91 features. We'll put this lcv_mask to the side for a moment and move on to the next model.

7. Feature selection with random forest

The second model we train is a random forest regressor model. We've wrapped a Recursive Feature Selector or RFE, around the model to have it select the same number of features as the LassoCV() regressor did. We can then use the support_ attribute of the fitted model to create rf_mask.

8. Feature selection with gradient boosting

Then, we do the same thing with a gradient boosting regressor. Like random forests gradient boosting is an ensemble method that will calculate feature importance values. The trained model too has a support_ attribute that we can use to create gb_mask.

9. Combining the feature selectors

Finally, we can start counting the votes on whether to select a feature. We use NumPy's sum() function, pass it the three masks in a list, and set the axis argument to 0. We'll then get an array with the number of votes that each feature got. What we do with this vote then depends on how conservative we want to be. If we want to make sure we don't lose any information, we could select all features with at least one vote. In this example we chose to have at least two models voting for a feature in order to keep it. All that is left now is to actually implement the dimensionality reduction. We do that with the loc method of our feature DataFrame X.

10. Let's practice!

Now it's your turn to combine the feature selection capabilities of multiple models.

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.