IniziaInizia gratis

Create a high-correlation recipe

Once you have identified highly correlated features, instead of removing them manually, you can use the step_corr() recipe step in tidymodels. step_corr() does not remove all features that are correlated with other features. It attempts to remove as few features as possible. Conceptually, as you saw in the multiple choice exercise, it removes the feature that has the most overlap with any combination of other features. The idea is that the other features contain the same information, so the overlapping information of the removed feature is still represented in those other features.

The tidyverse and tidymodels packages have been loaded for you.

Questo esercizio fa parte del corso

Dimensionality Reduction in R

Visualizza il corso

Istruzioni dell'esercizio

  • Create a recipe that uses step_corr() with a threshold of 0.7, applying the step to numeric predictors only.
  • Apply the recipe to house_sales_df and store the filtered data in filtered_house_sales_df.
  • Use tidy() to identify the column or columns that the step_corr() filter removed.

Esercizio pratico interattivo

Prova a risolvere questo esercizio completando il codice di esempio.

# Create a recipe using step_corr to remove numeric predictors correlated > 0.7
corr_recipe <-  
  ___(price ~ ., data = ___) %>% 
  ___(___, ___ = ___) %>% 
  ___(___) 

# Apply the recipe to the data
___ <- 
  ___ %>% 
  ___(new_data = ___)

# Identify the features that were removed
___(___, ___ = ___)
Modifica ed esegui il codice