Filteren op variabelebelang

De gegevensset attrition bevat 839 observaties en 30 voorspellers voor "Attrition". Je wilt de afweging verkennen tussen de prestatie van een model dat alle beschikbare voorspellers gebruikt en een gereduceerd model dat is gebaseerd op een paar informatieve variabelen.

In deze oefening fit je een model en bekijk je het variabelebelang van dit gefitte model. In de volgende oefening beoordeel je de modelprestatie met dit model vergeleken met een gereduceerd model.

De train- en test-splits en het pakket vip() zijn beschikbaar in je omgeving, samen met een vooraf gedeclareerd logistisch regressie-model.

Deze oefening maakt deel uit van de cursus

Feature engineering in R

Cursus bekijken

Oefeninstructies

Maak een recipe die Attrition modelleert met alle voorspellers.
Fit de workflow op de trainingsdata.
Gebruik het object fit_full om het variabelebelang van je model te plotten.
Pas de functie extract_fit_parsnip() toe vóór vip() om het van de vereiste informatie te voorzien.

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

# Create a recipe that models Attrition using all the predictors
recipe_full <- ___(___, data = train)

workflow_full <- workflow() %>%
  add_model(model) %>%
  add_recipe(recipe_full)

# Fit the workflow to the training data
fit_full <- ___ %>%
  ___(data = train)

# Use the fit_full object to graph the variable importance of your model. Apply extract_fit_parsnip() function before vip()
fit_full %>% ___() %>%
  ___(aesthetics = list(fill = "steelblue"))

Code bewerken en uitvoeren

Deze oefening maakt deel uit van de cursus

Feature engineering in R

SkillTag.level.intermediateSkillTag.label

4.8+

Begin de cursus gratis

Raw data does not always come in its best shape for analysis. In this opening chapter, you will get a first look at how to transform and create features that enhance your model's performance and interpretability.

Exercise 1: What is feature engineering?Exercise 2: A tentative model Exercise 3: Manually engineering a feature Exercise 4: Creating new features using domain knowledge Exercise 5: Setting up your data for analysis Exercise 6: Building a workflow Exercise 7: Increasing the information content of raw data Exercise 8: Identifying missing values Exercise 9: Imputing missing values and creating dummy variables Exercise 10: Fitting and assessing the model Exercise 11: Predicting hotel bookings

In this chapter, you’ll learn that, beyond manually transforming features, you can leverage tools from the tidyverse to engineer new variables programmatically. You’ll explore how this approach improves your models' reproducibility and is especially useful when handling datasets with many features.

Exercise 1: Why transform existing features?Exercise 2: Glancing at your data Exercise 3: Normalizing and log-transforming Exercise 4: Fit and augment Exercise 5: Customize your model assessment Exercise 6: Common feature transformations Exercise 7: Common transformations Exercise 8: Plain recipe Exercise 9: Box-Cox transformation Exercise 10: Yeo-Johnson transformation Exercise 11: Advanced transformations Exercise 12: Baseline Exercise 13: step_poly()Exercise 14: step_percentile()Exercise 15: Who's staying?

You’ll now learn how models often benefit from reducing dimensionality and extracting features from high-dimensional data, including converting text data into numeric values, encoding categorical data, and ranking the predictive power of variables. You’ll explore methods including principal component analysis, kernel principal component analysis, numerical extraction from text, categorical encodings, and variable importance scores.

Exercise 1: Reducing dimensionality Exercise 2: Prepping the stage Exercise 3: Digging into the structure Exercise 4: Percent of variance explained Exercise 5: Visualizing variance explained Exercise 6: Feature hashing Exercise 7: Investigating education field Exercise 8: Into the matrix Exercise 9: Exploring the hashing Exercise 10: Visualizing the hashing Exercise 11: Encoding categorical data using supervised learning Exercise 12: Setting up your workflow Exercise 13: Fitting, augmenting, and assessing Exercise 14: Binding models together Exercise 15: Variable Importance Exercise 16: Create a workflow Exercise 17: Fit and augment Exercise 18: Which is the main predictor?

You’ll wrap up the course by learning about feature engineering and machine learning techniques. You’ll begin by focusing on the problems associated with using all available features in a model and the importance of identifying irrelevant and redundant features and learning to remove these features using embedded methods such as lasso and elastic-net. Next, you’ll explore shrinkage methods such as lasso, ridge, and elastic-net, which can be used to regularize feature weights or select features by setting coefficients to zero. Finally, you’ll finish by focusing on creating an end-to-end feature engineering workflow and reviewing and practicing the previously learned concepts and functions in a small project.

Exercise 1: Het aantal features van het model verminderen Exercise 2: Filteren op variabelebelang

Huidige oefening

Exercise 3: Modelprestatie beoordelen met alle beschikbare predictoren Exercise 4: Een gereduceerd model bouwen Exercise 5: Shrinkage-methoden Exercise 6: Handmatige regularisatie met Lasso Exercise 7: De penalty tunen Exercise 8: Het model finaliseren Exercise 9: Alles samenbrengen Exercise 10: Voorbereiden en splitsen Exercise 11: Preprocessen Exercise 12: Model Exercise 13: Beoordelen Exercise 14: Gefeliciteerd!