Reduce data using feature importances
Now that you have created a full random forest model, you will explore feature importance.
Even though random forest models naturally — but implicitly — perform feature selection, it is often advantageous to build a reduced model. A reduced model trains faster, computes predictions faster, and is easier to understand and manage. Of course, it is always a trade-off between model simplicity and model performance.
In this exercise, you will reduce the data set. In the next exercise, you will fit a reduced model and compare its performance to the full model. rf_fit
, train
, and test
are provided for you.
The tidyverse
, tidymodels
, and vip
packages have been loaded for you.
This exercise is part of the course
Dimensionality Reduction in R
Exercise instructions
- Use
vi()
with therank
parameter to extract the ten most important features. - Add the target variable back to the top feature list.
- Apply the top feature mask to reduce the data sets.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Extract the top ten features
top_features <- ___ %>%
___(___ = ___) %>%
filter(___) %>%
pull(Variable)
# Add the target variable to the feature list
top_features <- c(___, "___")
# Reduce and print the data sets
train_reduced <- train[___]
test_reduced <- ___[___]
train_reduced %>% head(5)
test_reduced %>% head(5)