Create a low-variance recipe

The tidymodels packages provides a better way to filter no- and near-zero-variance features with its step_zv() and step_nzv() functions, respectively. These recipe steps identify low-variance features by examining the number of unique values and the ratio of the frequency of the most common values in each feature. This approach is more robust than the simple variance cutoff we used previously.

In addition, you will use the step_scale() recipe step to normalize the variance of the features. Remember it's always a good idea to normalize the data to make variances across features comparable.

The house_sales_df is available for you to use. The target variable is price. The tidyverse and tidymodels packages have also been loaded for you.

This exercise is part of the course

Dimensionality Reduction in R

View Course

Exercise instructions

Define a recipe for a low-variance filter and prepare it using house_sales_df.
Apply the recipe to house_sales_df and store the filtered data in filtered_house_sales_df.
Display the features that the recipe filtered in the step_nzv() step.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Prepare recipe
low_variance_recipe <- recipe(___ ~ ___, ___ = ___) %>% 
  step_zv(___) %>% 
  ___(___) %>% 
  ___(___) %>% 
  prep()

# Apply recipe
filtered_house_sales_df <- ___(___, new_data = ___)

# View list of features removed by the near-zero variance step 
tidy(___, number = ___)

Edit and Run Code