Get startedGet started for free

Regularized regression

1. Regularized regression

Now let's explore regularization in regression, a technique used to avoid overfitting.

2. Why regularize?

Recall that fitting a linear regression model minimizes a loss function to choose a coefficient, a, for each feature, and the intercept, b. If we allow these coefficients to be very large, we can get overfitting. Therefore, it is common practice to alter the loss function so that it penalizes large coefficients. This is called regularization.

3. Ridge regression

The first type of regularized regression that we'll look at is called ridge. With ridge, we use the Ordinary Least Squares loss function plus the squared value of each coefficient, multiplied by a constant, alpha. So, when minimizing the loss function, models are penalized for coefficients with large positive or negative values. When using ridge, we need to choose the alpha value in order to fit and predict. Essentially, we can select the alpha for which our model performs best. Picking alpha for ridge is similar to picking k in KNN. Alpha in ridge is known as a hyperparameter, which is a variable used for selecting a model's parameters. Alpha controls model complexity. When alpha equals zero, we are performing OLS, where large coefficients are not penalized and overfitting may occur. A high alpha means that large coefficients are significantly penalized, which can lead to underfitting.

4. Ridge regression in scikit-learn

To perform ridge regression in scikit-learn, we import Ridge from sklearn-dot-linear_model. To highlight the impact of different alpha values, we create an empty list for our scores, then loop through a list of different alpha values. Inside the for loop we instantiate Ridge, setting the alpha keyword argument equal to the iterator, also called alpha. We fit on the training data, and predict on the test data. We save the model's R-squared value to the scores list. Finally, outside of the loop, we print the scores for the models with five different alpha values. We see performance gets worse as alpha increases.

5. Lasso regression

There is another type of regularized regression called lasso, where our loss function is the OLS loss function plus the absolute value of each coefficient multiplied by some constant, alpha.

6. Lasso regression in scikit-learn

To use Lasso we import it from sklearn-dot-linear_model. The actual method for performing lasso regression in scikit-learn mirrors ridge regression, as we can see here. Performance drops substantially as alpha goes over 20!

7. Lasso regression for feature selection

Lasso regression can actually be used to assess feature importance. This is because it tends to shrink the coefficients of less important features to zero. The features whose coefficients are not shrunk to zero are selected by the lasso algorithm. Let's check this out in practice.

8. Lasso for feature selection in scikit-learn

We import Lasso. Next, we create our feature and target arrays, and use the dataset's dot-columns attribute to access the feature names and store as the variable names. As we are calculating feature importance we use the entire dataset, rather than splitting it. We then instantiate Lasso, setting alpha to zero-point-one. We fit the model to the data and extract the coefficients using the dot-coef-underscore attribute, storing as lasso_coef. We then plot the coefficients for each feature.

9. Lasso for feature selection in scikit-learn

We can see that the most important predictor for our target variable, blood glucose levels, is the binary value for whether an individual has diabetes or not! This is not surprising, but is a great sanity check. This type of feature selection is very important because it allows us to communicate results to non-technical audiences. It is also useful for identifying which factors are important predictors for various physical phenomena.

10. Let's practice!

Now let's apply regularization to our regression models!