Get startedGet started for free

Selecting features for model performance

1. Selecting features for model performance

In previous chapters, we've always looked at individual or pairwise properties of features to decide on whether we keep them in the dataset or not. Another, more pragmatic, approach is to select features based on how they affect model performance.

2. Ansur dataset sample

Consider this sample of the ANSUR dataset with one target variable, "Gender" which we'll try to predict, and five body measurement features to do so.

3. Pre-processing the data

To train a model on this data we should first perform a train - test split, and in this case also standardize the training feature dataset X_train to have a mean of zero and a variance of one. Notice that we've used the .fit_transform() method to fit the scaler and transform the data in one command.

4. Creating a logistic regression model

We can then create and fit a logistic regression model on this standardized training data. To see how the model performs on the test set we first scale these features with the .transform() method of the scaler that we just fit on the training set and then make our prediction. we get a test-set accuracy of 99%.

5. Inspecting the feature coefficients

However, when we look at the feature coefficients that the logistic regression model uses in its decision function, we'll see that some values are pretty close to zero. Since these coefficients will be multiplied with the feature values when the model makes a prediction, features with coefficients close to zero will contribute little to the end result. We can use the zip function to transform the output into a dictionary that shows which feature has which coefficient. If we want to remove a feature from the initial dataset with as little effect on the predictions as possible, we should pick the one with the lowest coefficient, "handlength" in this case. The fact that we standardized the data first makes sure that we can compare the coefficients to one another.

6. Features that contribute little to a model

When we remove the "handlength" feature at the start of the process, our model accuracy remains unchanged at 99% while we did reduce our dataset's complexity. We could repeat this process until we have the desired number of features remaining, but it turns out, there's a Scikit-learn function that does just that.

7. Recursive Feature Elimination

RFE for "Recursive Feature Elimination" is a feature selection algorithm that can be wrapped around any model that produces feature coefficients or feature importance values. We can pass it the model we want to use and the number of features we want to select. While fitting to our data it will repeat a process where it first fits the internal model and then drops the feature with the weakest coefficient. It will keep doing this until the desired number of features is reached. If we set RFE's verbose parameter higher than zero we'll be able to see that features are dropped one by one while fitting. We could also decide to just keep the 2 features with the highest coefficients after fitting the model once, but this recursive method is safer, since dropping one feature will cause the other coefficients to change.

8. Inspecting the RFE results

Once RFE is done we can check the support_ attribute that contains True/False values to see which features were kept in the dataset. Using the zip function once more, we can also check out rfe's ranking_ attribute to see in which iteration a feature was dropped. Values of 1 mean that the feature was kept in the dataset until the end while high values mean the feature was dropped early on. Finally, we can test the accuracy of the model with just two remaining features, 'chestdepth' and 'neckcircumference', turns out the accuracy is still untouched at 99%. This means the other 3 features had little to no impact on the model an its predictions.

9. Let's practice!

Now it's your turn to use recursive feature elimination.