Get startedGet started for free

Reduce data using feature importances

Now that you have created a full random forest model, you will explore feature importance.

Even though random forest models naturally — but implicitly — perform feature selection, it is often advantageous to build a reduced model. A reduced model trains faster, computes predictions faster, and is easier to understand and manage. Of course, it is always a trade-off between model simplicity and model performance.

In this exercise, you will reduce the data set. In the next exercise, you will fit a reduced model and compare its performance to the full model. rf_fit, train, and test are provided for you.

The tidyverse, tidymodels, and vip packages have been loaded for you.

This exercise is part of the course

Dimensionality Reduction in R

View Course

Exercise instructions

  • Use vi() with the rank parameter to extract the ten most important features.
  • Add the target variable back to the top feature list.
  • Apply the top feature mask to reduce the data sets.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Extract the top ten features
top_features <- ___ %>% 
  ___(___ = ___) %>% 
  filter(___) %>% 
  pull(Variable)

# Add the target variable to the feature list
top_features <- c(___, "___")

# Reduce and print the data sets
train_reduced <- train[___]
test_reduced <- ___[___]
train_reduced %>% head(5)
test_reduced %>% head(5)
Edit and Run Code