PCA in tidymodels
From a model building perspective, PCA allows you to create models with fewer features, but still capture most of the information in the original data. However, as you've seen, a disadvantage of PCA is the difficulty of interpreting the model. In this exercise, you will be focusing on building a linear regression model using a subset of the house sales data. The target variable is price
.
A model built directly from the data without extracting principal components has a RMSE of $236,461.4. You will apply PCA with tidymodels
and compare the new RMSE. Remember, lower RMSEs are better.
The tidyverse
and tidymodels
packages have been loaded for you.
Este exercício faz parte do curso
Dimensionality Reduction in R
Instruções do exercício
- Build a PCA recipe using
train
to extract five principal components. - Fit a workflow with a default
linear_reg()
model spec. - Create a test prediction data frame using
test
that contains the actual and predicted values. - Calculate the RMSE for the PCA-reduced linear regression model.
Exercício interativo prático
Experimente este exercício completando este código de exemplo.
# Build a PCA recipe
pca_recipe <- ___(___ ~ ___ , data = ___) %>%
___(___()) %>%
___(___(), num_comp = ___)
# Fit a workflow with a default linear_reg() model spec
house_sales_fit <- ___(preprocessor = ___, spec = ___()) %>%
___(___)
# Create prediction df for the test set
house_sales_pred_df <- ___(___, test) %>%
___(test %>% select(___))
# Calculate the RMSE
___(house_sales_pred_df, ___, .pred)