Get startedGet started for free

Feature selection vs. feature extraction

1. Feature selection vs. feature extraction

Let's now discuss the two main approaches to reducing dimensions — feature selection and feature extraction.

2. Approaches to dimensionality reduction

A vegetable garden analogy helps to differentiate feature selection and feature extraction. Feature selection is like pulling weeds. Weeds provide little benefit. Feature extraction is like making a salad where we combine the best parts of the plants — the whole head of lettuce, just the carrot roots, and just the tomato fruit. The resulting salad is a composite of the garden.

3. Feature selection

Let's dive deeper into feature selection. Imagine we have six features.

4. Feature selection

We can devise a filter to remove features with little or no information. In our example, the filter identified two low-information features.

5. Feature selection

When we apply the filter, four features are left. F1 and F5 were like weeds — they provided little benefit or information. The advantage of feature selection over feature extraction is that it is relatively easy to understand and implement.

6. Example credit data

Let's write some manual feature selection code. We'll use a sample of the credit score data. Notice how num_bank_accounts and num_credit_card don't appear to vary and outstanding_debt has missing values.

7. Create an zero-variance filter

We can create a filter to remove features that do not vary. First, we pass credit_df to summarize() to calculate the variance of all columns and use pivot_longer() to reorient the data vertically. Then we use filter to identify columns with zero variance and pull the column names out as a vector.

8. Create missing values filter

Similarly, we can create a missing values filter. We use summarize() to count the number of NAs in each column, then pivot the results with pivot_longer(). We then filter the results using an arbitrary threshold of zero, and pull the column names into a vector.

9. Applying the combined filter

We can combine the no variance and missing values filters using the c() function, then use the combined filter to reduce the data's dimensionality. The select(-all_of(combined_filter)) removes the three columns with missing values or zero-variance, leaving us with the more informative features annual_income and credit_history_months.

10. Feature extraction

Now let's dive deeper into feature extraction using the same six features.

11. Feature extraction

Instead of eliminating features, feature extraction combines information from two or more features to create new features — just like combining parts of vegetable plants can form a delicious salad. Notice how the blue F1 and pink F2 formed a purple F7; and the light gray F5 and the dark gray F6 formed a medium gray F8. However, this illustration is not completely accurate.

12. Feature extraction and mutual information

Remember our discussion on mutual information? Mutual information is redundant information.

13. Feature extraction: Combining mutual exclusive info

So the combined features would look more like this. Where the purple and the medium gray in the middle of the combined features represent the mutual information.

14. Feature extraction: Combining mutual exclusive info

We want to remove the information they have in common and keep their mutually exclusive information.

15. Advantages and disadvantages of feature extraction

In conclusion, let's discuss the advantages and disadvantages of feature extraction compared to feature selection. Feature extraction can combine information from existing features into new features. This may reveal relationships in the features. However, this ability comes at a price. Feature extraction algorithms tend to be more complicated. This is why we demonstrate the code in a later chapter. In addition, the extracted features tend to be more difficult to interpret. The plot shows how three dimensions — height, weight, and body mass index — are reduced to two dimensions along the x and y axes. The x-axis corresponds perfectly to body mass index. How would we label the y-axis which is a mix of height and weight?

16. Let's practice!

For now, let's practice.