Lasso Regression
1. Lasso Regression
Welcome back. Let's talk about lasso regression — a type of supervised feature selection.2. Lasso regression overview
Lasso regression performs supervised feature selection using L1 regularization. That means it penalizes the regression coefficients so less important coefficients get reduced to zero. Thus, it naturally performs feature selection. Lasso regression in tidymodels is performed with linear_reg() by setting mixture to one. A mixture of zero performs ridge regression, which we don't cover in this course. The penalty argument determines how many features are kept in the model — larger values keep fewer features. Valid penalty values are zero and above.3. Standardize data
We standardize the data so that the penalty affects all feature coefficients similarly. In tidymodels, we standardize the target variable separately from the predictors. For the target variable, scale() returns a matrix, so we convert it to a vector. For the predictors, we use step_normalize() in a recipe. Both approaches normalize the data to have a standard deviation of one and a mean of zero.4. Choosing a penalty value
The penalty is a hyperparameter which we will optimize by searching a space of penalty values. In tidymodels, we use tune() to identify the parameters we want to optimize. Here is how we define linear_reg() so we can tune the penalty. Notice penalty equals tune().5. Preparing the data
Now let's go through the entire process of creating a workflow to fit a lasso regression model — both with and without tuning the penalty. We'll use a subset of the housing data to illustrate this. To prepare the data, we first scale the target variable price. Notice that we wrap scale() in as-dot-vector. Then, we can create our training and testing sets as usual.6. Create a recipe
Then we create a recipe, applying step_normalize() to all numeric predictors, to scale the predictor variables.7. Create the workflow
We create a model spec with linear_reg(), setting mixture to one, and using glmnet because it supports lasso regression. Let's set the penalty to zero point zero one. We create a workflow, passing it the lasso_recipe as the preprocessor and lasso_model as the spec.8. Fit the workflow
To display the model, we call fit to train lasso_workflow and pass it to tidy(). We keep only non-zero coefficient estimates, removing the coefficients that lasso regression shrunk to zero.9. Create a tunable model workflow
To tune the penalty value, we set the penalty argument to tune() in linear_reg() and create the workflow with the updated tunable model. We use vfold_cv() with v set to five to create a cross validation training sample with five folds. grid_regular() in tidymodels creates a grid, or range, of penalty values to explore. The first argument is a function form of penalty which we provide with a two element list to specify the range as exponents. In this example, we explore a penalty range from zero point zero zero one - specified as negative three, meaning one to the negative power of three — to zero point one — specified as negative one, meaning one to the negative power of one. We set levels to twenty to generate twenty different penalty values in that range.10. Fit a grid of models
Then we use tidymodels tune_grid() to fit a model for each of the twenty penalty values. We pass tune_grid() the workflow object, the training sample, and the penalty grid. We pass the resulting lasso_grid object to autoplot and specify root mean square error as the metric to see the model performance at each penalty level.11. Penalty performance plot
The resulting plot shows the RMSE begins to increase exponentially starting with a penalty of zero point zero one.12. Finalize the model
We retrieve the least RMSE using select_best() and use finalize_workflow() to fit the best model. Lastly, we view the coefficients of the selected features in the final lasso model.13. Let's practice!
Let's practice.Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.