1. Logistic regression and regularization
Welcome to Chapter 3! In this chapter we'll use the practical skills from Chapter 1 and the concepts from Chapter 2 to dig deeper into logistic regression.
2. Regularized logistic regression
The prerequisite course, "supervised learning with scikit-learn",
mentions that regularization combats overfitting by making the model coefficients smaller. The figure shows the learned coefficients of a logistic regression model with default regularization.
In scikit-learn, the hyperparameter "C" is the inverse of the regularization strength. In other words, larger C means less regularization and smaller C means more regularization. Let's test this out.
3. Regularized logistic regression
The orange curve shows what happens if we use a smaller value of C, which means more regularization for our logistic regression model. As expected, regularization makes the coefficients smaller.
4. How does regularization affect training accuracy?
Let's see how regularization influences training and test accuracy.
With the movie review data set already loaded and split into train and test sets, we instantiate two logistic regression models, one with weak regularization and one with strong regularization.
We then fit both models.
Next, we compute training accuracy. The model with weak regularization gets a higher training accuracy.
Now that we've studied loss functions, we can see why regularization makes the training accuracy go down:
Regularization is an extra term that we add to the original loss function, which penalizes large values of the coefficients. Intuitively, without regularization, we are maximizing the training accuracy, so we do well on that metric. When we add regularization, we're modifying the loss function to penalize large coefficients, which distracts from the goal of optimizing accuracy. The larger the regularization penalty (or the smaller we set C), the more we deviate from our goal of maximizing training accuracy. Hence, training accuracy goes down.
5. How does regularization affect test accuracy?
Let's look at the test accuracy this time. As we can see, regularization improved it.
We discussed why regularization reduces training accuracy, but why does it improve test accuracy? Imagine you did not have access to a particular feature; that's like setting the corresponding coefficient to zero. Regularizing, and thus making your coefficient smaller, is like a compromise between not using the feature at all (setting the coefficient to zero) and fully using it (the un-regularized coefficient value). If using a feature too heavily was causing overfitting, then regularization causes you to "fit less" and thus overfit less.
6. L1 vs. L2 regularization
For linear regression we use the terms Ridge and Lasso for two different types of regularization. The general names for these concepts, outside linear regression, are L1 regularization and L2 regularization.
Everything you learned about ridge (or L2) and lasso (or L1) in the past applies to logistic regression as well. For example, both help reduce overfitting, and L1 also performs feature selection.
As an example, let's train two logistic regression models, with L1 and L2 regularization, on the breast cancer dataset after scaling features, which is usually good practice, especially when using regularization. We can plot the coefficients for both models, adding a grid so that we can see where zero is.
Note the "solver" argument used when creating the model with L1 regularization. This argument controls the optimization method used to find the coefficients. We need to set this here because the default solver is not compatible with L1 regularization.
7. L2 vs. L1 regularization
Here are the plots. As you can see, L1 regularization set many of the coefficients to zero, thus ignoring those features; in other words, it performed feature selection for us. On the other hand, L2 regularization just shrinks the coefficients to be smaller. This is analogous to what happens with Lasso and Ridge regression.
8. Let's practice!
Now it's your turn to explore regularization for logistic regression.