Get startedGet started for free

Logistic regression: revisited

1. Logistic regression: revisited

Before we build a logistic regression using text features, we transformed the text fields to numeric columns. As a result, we might end up having hundreds or even thousands of features, which can make the model quite complex.

2. Complex models and regularization

A complex model can occur in a few scenarios. If we use a very complicated function to explain the relationship of interest, we will inevitably fit the noise in the data. Such a model will not perform well when used to score unseen data. This is also called overfitting. A complex model could stem from including too many unnecessary features and parameters; especially with transformed text data, where we might create thousands of extra numeric columns. These two sources of complexity often go hand-in-hand. One way to artificially discourage complex models is by the use of regularization. When using regularization, we are penalizing, or restricting the function of the model.

3. Regularization in a logistic regression

Regularization is applied by default in the logistic regression function from sklearn. It uses the so-called L2 penalty; the details of it are outside of the scope of this course, but intuitively it's good to know that the L2 penalty shrinks all the coefficients towards zero, effectively reducing the impact of each feature. The parameter that determines the strength of regularization is given by C, which takes a default value of 1. Higher values of C correspond to less regularization, in other words, the model will try to fit the data as best as possible. Small values of C correspond to high penalization(or regularization), meaning that the coefficients of the logistic regression will be closer to zero; the model will be less flexible because it will not fit the training data so well. How to find the most appropriate value of C? Usually we need to test different values and see which one gives us the best performance on the test data.

4. Predicting a probability vs. predicting a class

You should recall that when we trained a logistic regression model, we applied the predict function to the test set to predict the labels. The predict function predicts a class: 0 or 1 if we are working with a binary classifier. However, instead of a class, we can predict a probability using the predict_proba function. We again pass as an argument the test dataset.

5. Predicting a probability vs. predicting a class

This returns an array of probabilities, ordered by the label of classes - first the class 0 then the class 1. The probabilities for each observation are displayed on a separate row. The first value is the probability that the instance is of class 0, and the second of a class 1. Therefore, is is common when predicting the probabilities to specify already that we want to extract the probabilities of the class 1.

6. Model metrics with predicted probabilities

One important thing to know is that we cannot directly apply the accuracy score or confusion matrix to the predicted probabilities. If you do that in sklearn, you will get a ValueError. The reason is that the accuracy and the confusion matrix work directly with classes. If we have predicted probabilities, we need to encode them as classes. The default is that any probability higher or equal to 0.5 is translated to class 1, otherwise to class 0. However, you can change that threshold depending on your problem. Imagine only 1% of the reviews are positive and you have built a model to predict whether a new review is positive or negative. In that context, you don't want to translate any predicted probability higher than 0.5 to class 1, this threshold should be much lower.

7. Let's practice!

Let's apply what we've learned in the exercises!

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.