Get startedGet started for free

PCA in tidymodels

From a model building perspective, PCA allows you to create models with fewer features, but still capture most of the information in the original data. However, as you've seen, a disadvantage of PCA is the difficulty of interpreting the model. In this exercise, you will be focusing on building a linear regression model using a subset of the house sales data. The target variable is price.

A model built directly from the data without extracting principal components has a RMSE of $236,461.4. You will apply PCA with tidymodels and compare the new RMSE. Remember, lower RMSEs are better.

The tidyverse and tidymodels packages have been loaded for you.

This exercise is part of the course

Dimensionality Reduction in R

View Course

Exercise instructions

  • Build a PCA recipe using train to extract five principal components.
  • Fit a workflow with a default linear_reg() model spec.
  • Create a test prediction data frame using test that contains the actual and predicted values.
  • Calculate the RMSE for the PCA-reduced linear regression model.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Build a PCA recipe
pca_recipe <- ___(___ ~ ___ , data = ___) %>% 
  ___(___()) %>% 
  ___(___(), num_comp = ___) 

# Fit a workflow with a default linear_reg() model spec
house_sales_fit <- ___(preprocessor = ___, spec = ___()) %>% 
  ___(___)

# Create prediction df for the test set
house_sales_pred_df <- ___(___, test) %>% 
  ___(test %>% select(___))

# Calculate the RMSE
___(house_sales_pred_df, ___, .pred)
Edit and Run Code