Preprocess

Feature engineering time! You need to build a recipe to take care of non-informative but possibly valuable variables such as observation ID or deal with missing values. This is also an opportunity to transform some predictors. Say, normalize numerical features and create dummy variables for categorical ones.

The attrition dataset and the train and test splits you created in the previous exercise are available in your environment.

This exercise is part of the course

Feature Engineering in R

Exercise instructions

Normalize all numeric features.
Impute missing values using the knn imputation algorithm.
Create dummy variables for all nominal predictors.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

recipe <- recipe(Attrition ~ ., data = train) %>%
  update_role(...1, new_role = "ID") %>%

# Normalize all numeric features
  ___(all_numeric_predictors()) %>% 

# Impute missing values using the knn imputation algorithm
  ___(all_predictors()) %>%

# Create dummy variables for all nominal predictors
  ___(all_nominal_predictors())
 
recipe

Edit and Run Code

This exercise is part of the course

Feature Engineering in R

IntermediateSkill Level

4.8+

Start Course for Free

Raw data does not always come in its best shape for analysis. In this opening chapter, you will get a first look at how to transform and create features that enhance your model's performance and interpretability.

Exercise 1: What is feature engineering?Exercise 2: A tentative model Exercise 3: Manually engineering a feature Exercise 4: Creating new features using domain knowledge Exercise 5: Setting up your data for analysis Exercise 6: Building a workflow Exercise 7: Increasing the information content of raw data Exercise 8: Identifying missing values Exercise 9: Imputing missing values and creating dummy variables Exercise 10: Fitting and assessing the model Exercise 11: Predicting hotel bookings

In this chapter, you’ll learn that, beyond manually transforming features, you can leverage tools from the tidyverse to engineer new variables programmatically. You’ll explore how this approach improves your models' reproducibility and is especially useful when handling datasets with many features.

Exercise 1: Why transform existing features?Exercise 2: Glancing at your data Exercise 3: Normalizing and log-transforming Exercise 4: Fit and augment Exercise 5: Customize your model assessment Exercise 6: Common feature transformations Exercise 7: Common transformations Exercise 8: Plain recipe Exercise 9: Box-Cox transformation Exercise 10: Yeo-Johnson transformation Exercise 11: Advanced transformations Exercise 12: Baseline Exercise 13: step_poly()Exercise 14: step_percentile()Exercise 15: Who's staying?

You’ll now learn how models often benefit from reducing dimensionality and extracting features from high-dimensional data, including converting text data into numeric values, encoding categorical data, and ranking the predictive power of variables. You’ll explore methods including principal component analysis, kernel principal component analysis, numerical extraction from text, categorical encodings, and variable importance scores.

Exercise 1: Reducing dimensionality Exercise 2: Prepping the stage Exercise 3: Digging into the structure Exercise 4: Percent of variance explained Exercise 5: Visualizing variance explained Exercise 6: Feature hashing Exercise 7: Investigating education field Exercise 8: Into the matrix Exercise 9: Exploring the hashing Exercise 10: Visualizing the hashing Exercise 11: Encoding categorical data using supervised learning Exercise 12: Setting up your workflow Exercise 13: Fitting, augmenting, and assessing Exercise 14: Binding models together Exercise 15: Variable Importance Exercise 16: Create a workflow Exercise 17: Fit and augment Exercise 18: Which is the main predictor?

You’ll wrap up the course by learning about feature engineering and machine learning techniques. You’ll begin by focusing on the problems associated with using all available features in a model and the importance of identifying irrelevant and redundant features and learning to remove these features using embedded methods such as lasso and elastic-net. Next, you’ll explore shrinkage methods such as lasso, ridge, and elastic-net, which can be used to regularize feature weights or select features by setting coefficients to zero. Finally, you’ll finish by focusing on creating an end-to-end feature engineering workflow and reviewing and practicing the previously learned concepts and functions in a small project.

Exercise 1: Reducing the model's features Exercise 2: Sifting through variable importance Exercise 3: Assessing model performance using all available predictors Exercise 4: Building a reduced model Exercise 5: Shrinkage methods Exercise 6: Manual regularization with Lasso Exercise 7: Tuning the penalty Exercise 8: Finalizing the model Exercise 9: Putting it all together Exercise 10: Prep and split Exercise 11: Preprocess

Current Exercise

Exercise 12: Model Exercise 13: Assess Exercise 14: Congratulations!