PCA in tidymodels
From a model building perspective, PCA allows you to create models with fewer features, but still capture most of the information in the original data. However, as you've seen, a disadvantage of PCA is the difficulty of interpreting the model. In this exercise, you will be focusing on building a linear regression model using a subset of the house sales data. The target variable is price
.
A model built directly from the data without extracting principal components has a RMSE of $236,461.4. You will apply PCA with tidymodels
and compare the new RMSE. Remember, lower RMSEs are better.
The tidyverse
and tidymodels
packages have been loaded for you.
This exercise is part of the course
Dimensionality Reduction in R
Exercise instructions
- Build a PCA recipe using
train
to extract five principal components. - Fit a workflow with a default
linear_reg()
model spec. - Create a test prediction data frame using
test
that contains the actual and predicted values. - Calculate the RMSE for the PCA-reduced linear regression model.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Build a PCA recipe
pca_recipe <- ___(___ ~ ___ , data = ___) %>%
___(___()) %>%
___(___(), num_comp = ___)
# Fit a workflow with a default linear_reg() model spec
house_sales_fit <- ___(preprocessor = ___, spec = ___()) %>%
___(___)
# Create prediction df for the test set
house_sales_pred_df <- ___(___, test) %>%
___(test %>% select(___))
# Calculate the RMSE
___(house_sales_pred_df, ___, .pred)