Reduce data using feature importances

Now that you have created a full random forest model, you will explore feature importance.

Even though random forest models naturally — but implicitly — perform feature selection, it is often advantageous to build a reduced model. A reduced model trains faster, computes predictions faster, and is easier to understand and manage. Of course, it is always a trade-off between model simplicity and model performance.

In this exercise, you will reduce the data set. In the next exercise, you will fit a reduced model and compare its performance to the full model. rf_fit, train, and test are provided for you.

The tidyverse, tidymodels, and vip packages have been loaded for you.

Use vi() with the rank parameter to extract the ten most important features.
Add the target variable back to the top feature list.
Apply the top feature mask to reduce the data sets.

Foundations of Dimensionality Reduction

Feature Selection for Feature Importance

Feature Selection for Model Performance

Feature Extraction and Model Performance

Exercise

Reduce data using feature importances

Instructions