Regularized linear regression

1. Regularized linear regression

So far, we focused on how to reduce dimensionality using classification algorithms. Let's see what we can do with regressions.

2. Linear model concept

To refresh how linear regressions work, we'll build a model that derives the linear function between three input values and a target. However, we'll be creating the feature dataset and linear function ourselves, so that we can control the ground truth that our model tries to derive.

3. Creating our own dataset

We create three features x1, x2, and x3

4. Creating our own dataset

that all follow a simple normal distribution.

5. Creating our own dataset

We can then create our own target y with a function of our choice. Let's say that y equals 20 plus 5 times x1, 2 times x2, zero times x3 and an error term. The 20 at the start is called the intercept; 5, 2 and 0 are the coefficients of our features, they determine how big an effect each has on the target. The third feature has a coefficient of zero and will therefore have no effect on the target whatsoever. It would be best to remove it from the dataset, since it could confuse a model and make it overfit. Now that we've set the ground truth for this dataset, let's see if a model can derive it.

6. Linear regression in Python

When you fit a LinearRegression() model with Scikit Learn, the model object will have .coef_ for coefficient attribute that contains a NumPy array with a number of elements equal to the number of input features. These are the three values we just set to 5, 2, and 0 and the model was able to estimate them pretty accurately. Same goes for the intercept.

7. Linear regression in Python

To check how accurate the model's predictions are we can calculate the R-squared value on the test set. This tells us how much of the variance in the target feature our model can predict. Our model scores an impressive 97.6%.

8. Linear regression in Python

However, the third feature, which had no effect whatsoever, was estimated to have a small effect of -0.05. If there would be more of these irrelevant features, the model could overfit. To solve this, we'll have to look at what the model actually does while training.

9. Loss function: Mean Squared Error

The model will try to find optimal values for the intercept and coefficients by minimizing a loss function.

10. Loss function: Mean Squared Error

This function contains the mean sum of the squared differences between actual and predicted values, the gray squares in the plot. Minimizing this MSE makes the model as accurate a possible. However, we don't want our model to be super accurate on the training set if that means it no longer generalizes to new data.

11. Adding regularization

To avoid this we can introduce regularization. The model will then not only try to be as accurate as possible by minimizing the MSE,

12. Adding regularization

it will also try to keep the model simple by keeping the coefficients low. The strength of regularization can be tweaked with

13. Adding regularization

alpha, when it's too low the model might overfit, when it's too high the model might become too simple and inaccurate. One linear model that includes this type of regularization is called Lasso, for least absolute shrinkage and selection operator.

14. Lasso regressor

When we fit it on our dataset we see that it indeed reduced the coefficient of the third feature to zero, ignoring it, but also that it reduced the other coefficients resulting in a lower R squared.

15. Lasso regressor

To avoid this we can change the alpha parameter. When we set it to 0.05 the third feature is still ignored but the other coefficients are reduced less and our R squared is up again.

16. Let's practice!

Now its your turn to apply regularized linear regression.

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.