Sifting through variable importance

The attrition dataset contains 839 observations and 30 predictors for "Attrition." You are interested in exploring the trade-off between the performance of a model that uses all available predictors versus a reduced model based on a few informative variables.

In this exercise, you'll fit a model and have a look at the variable importance of this fitted model. In the following exercise, you'll assess model performance using this model compared to using a reduced model.

The train and test splits and the vip() package are available in your environment along with a predeclared logistic regression model.

This exercise is part of the course

Feature Engineering in R

Exercise instructions

Create a recipe that models Attrition using all predictors.
Fit the workflow to the training data.
Use the fit_full object to graph the variable importance of your model.
Apply the extract_fit_parsnip() function before vip() to feed it the required information.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Create a recipe that models Attrition using all the predictors
recipe_full <- ___(___, data = train)

workflow_full <- workflow() %>%
  add_model(model) %>%
  add_recipe(recipe_full)

# Fit the workflow to the training data
fit_full <- ___ %>%
  ___(data = train)

# Use the fit_full object to graph the variable importance of your model. Apply extract_fit_parsnip() function before vip()
fit_full %>% ___() %>%
  ___(aesthetics = list(fill = "steelblue"))

Edit and Run Code

This exercise is part of the course

Feature Engineering in R

IntermediateSkill Level

4.8+

Start Course for Free

Raw data does not always come in its best shape for analysis. In this opening chapter, you will get a first look at how to transform and create features that enhance your model's performance and interpretability.

Exercise 1: What is feature engineering?Exercise 2: A tentative model Exercise 3: Manually engineering a feature Exercise 4: Creating new features using domain knowledge Exercise 5: Setting up your data for analysis Exercise 6: Building a workflow Exercise 7: Increasing the information content of raw data Exercise 8: Identifying missing values Exercise 9: Imputing missing values and creating dummy variables Exercise 10: Fitting and assessing the model Exercise 11: Predicting hotel bookings

In this chapter, you’ll learn that, beyond manually transforming features, you can leverage tools from the tidyverse to engineer new variables programmatically. You’ll explore how this approach improves your models' reproducibility and is especially useful when handling datasets with many features.

Exercise 1: Why transform existing features?Exercise 2: Glancing at your data Exercise 3: Normalizing and log-transforming Exercise 4: Fit and augment Exercise 5: Customize your model assessment Exercise 6: Common feature transformations Exercise 7: Common transformations Exercise 8: Plain recipe Exercise 9: Box-Cox transformation Exercise 10: Yeo-Johnson transformation Exercise 11: Advanced transformations Exercise 12: Baseline Exercise 13: step_poly()Exercise 14: step_percentile()Exercise 15: Who's staying?

You’ll now learn how models often benefit from reducing dimensionality and extracting features from high-dimensional data, including converting text data into numeric values, encoding categorical data, and ranking the predictive power of variables. You’ll explore methods including principal component analysis, kernel principal component analysis, numerical extraction from text, categorical encodings, and variable importance scores.

Exercise 1: Reducing dimensionality Exercise 2: Prepping the stage Exercise 3: Digging into the structure Exercise 4: Percent of variance explained Exercise 5: Visualizing variance explained Exercise 6: Feature hashing Exercise 7: Investigating education field Exercise 8: Into the matrix Exercise 9: Exploring the hashing Exercise 10: Visualizing the hashing Exercise 11: Encoding categorical data using supervised learning Exercise 12: Setting up your workflow Exercise 13: Fitting, augmenting, and assessing Exercise 14: Binding models together Exercise 15: Variable Importance Exercise 16: Create a workflow Exercise 17: Fit and augment Exercise 18: Which is the main predictor?

You’ll wrap up the course by learning about feature engineering and machine learning techniques. You’ll begin by focusing on the problems associated with using all available features in a model and the importance of identifying irrelevant and redundant features and learning to remove these features using embedded methods such as lasso and elastic-net. Next, you’ll explore shrinkage methods such as lasso, ridge, and elastic-net, which can be used to regularize feature weights or select features by setting coefficients to zero. Finally, you’ll finish by focusing on creating an end-to-end feature engineering workflow and reviewing and practicing the previously learned concepts and functions in a small project.

Exercise 1: Reducing the model's features Exercise 2: Sifting through variable importance

Current Exercise

Exercise 3: Assessing model performance using all available predictors Exercise 4: Building a reduced model Exercise 5: Shrinkage methods Exercise 6: Manual regularization with Lasso Exercise 7: Tuning the penalty Exercise 8: Finalizing the model Exercise 9: Putting it all together Exercise 10: Prep and split Exercise 11: Preprocess Exercise 12: Model Exercise 13: Assess Exercise 14: Congratulations!