Create a low-variance recipe
The tidymodels
packages provides a better way to filter no- and near-zero-variance features with its step_zv()
and step_nzv()
functions, respectively. These recipe steps identify low-variance features by examining the number of unique values and the ratio of the frequency of the most common values in each feature. This approach is more robust than the simple variance cutoff we used previously.
In addition, you will use the step_scale()
recipe step to normalize the variance of the features. Remember it's always a good idea to normalize the data to make variances across features comparable.
The house_sales_df
is available for you to use. The target variable is price
. The tidyverse
and tidymodels
packages have also been loaded for you.
This exercise is part of the course
Dimensionality Reduction in R
Exercise instructions
- Define a recipe for a low-variance filter and prepare it using
house_sales_df
. - Apply the recipe to
house_sales_df
and store the filtered data infiltered_house_sales_df
. - Display the features that the recipe filtered in the
step_nzv()
step.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Prepare recipe
low_variance_recipe <- recipe(___ ~ ___, ___ = ___) %>%
step_zv(___) %>%
___(___) %>%
___(___) %>%
prep()
# Apply recipe
filtered_house_sales_df <- ___(___, new_data = ___)
# View list of features removed by the near-zero variance step
tidy(___, number = ___)