PCA in tidymodels

From a model building perspective, PCA allows you to create models with fewer features, but still capture most of the information in the original data. However, as you've seen, a disadvantage of PCA is the difficulty of interpreting the model. In this exercise, you will be focusing on building a linear regression model using a subset of the house sales data. The target variable is price.

A model built directly from the data without extracting principal components has a RMSE of $236,461.4. You will apply PCA with tidymodels and compare the new RMSE. Remember, lower RMSEs are better.

The tidyverse and tidymodels packages have been loaded for you.

Build a PCA recipe using train to extract five principal components.
Fit a workflow with a default linear_reg() model spec.
Create a test prediction data frame using test that contains the actual and predicted values.
Calculate the RMSE for the PCA-reduced linear regression model.

Foundations of Dimensionality Reduction

Feature Selection for Feature Importance

Feature Selection for Model Performance

Feature Extraction and Model Performance

Exercise

PCA in tidymodels

Instructions