Raw data does not always come in its best shape for analysis. In this opening chapter, you will get a first look at how to transform and create features that enhance your model's performance and interpretability.
In this chapter, you’ll learn that, beyond manually transforming features, you can leverage tools from the tidyverse to engineer new variables programmatically. You’ll explore how this approach improves your models' reproducibility and is especially useful when handling datasets with many features.
You’ll now learn how models often benefit from reducing dimensionality and extracting features from high-dimensional data, including converting text data into numeric values, encoding categorical data, and ranking the predictive power of variables. You’ll explore methods including principal component analysis, kernel principal component analysis, numerical extraction from text, categorical encodings, and variable importance scores.
You’ll wrap up the course by learning about feature engineering and machine learning techniques. You’ll begin by focusing on the problems associated with using all available features in a model and the importance of identifying irrelevant and redundant features and learning to remove these features using embedded methods such as lasso and elastic-net. Next, you’ll explore shrinkage methods such as lasso, ridge, and elastic-net, which can be used to regularize feature weights or select features by setting coefficients to zero. Finally, you’ll finish by focusing on creating an end-to-end feature engineering workflow and reviewing and practicing the previously learned concepts and functions in a small project.