1. Logistic regression and probabilities
So far we've been using logistic regression to make hard predictions, meaning we predict either one class or the other. In this video we'll discuss how to interpret the raw model output of the classifier as a probability.
2. Logistic regression probabilities
We've seen this type of decision boundary several times in the course. The fill color shows what class we would predict for every point in the space.
In Chapter 1 we saw that the scikit-learn logistic regression object can output probabilities with the "predict_proba" function.
Let's make the same type of figure but this time showing the probabilities.
3. Logistic regression probabilities
In this figure, the new interpretation of the colors is the predicted probability of the red class. The black line is the old decision boundary, which we can refer to if we need to make definite, or hard, decisions. We can see that this line is where the probabilities cross point-5. In other words, if we're more than 50% sure it's red, we predict red, and if we're less than 50% sure it's red, we predict blue. We can also see that we get more and more confident as we move away from the decision boundary, which sounds reasonable.
In this figure, regularization is effectively disabled because C is very large.
4. Logistic regression probabilities
The figure on the right shows what happens when we turn on regularization. First, we see that the coefficients are smaller, as expected. The effect of regularization is that the probabilities are closer to point-5; we don't get to the very dark red or very dark blue on the right-hand figure. In other words, smaller coefficients mean less confident predictions. This fits with our story: regularization is supposed to combat overfitting, and there's a connection between overconfidence and overfitting.
By the way, these figures also answer a question you may have had from the previous chapter. With 2 features, we had 2 coefficients even though you only really need one number to represent the slope of a line. We now have a reason for this: the ratio of the coefficients gives us the slope of the line, and the magnitude of the coefficients gives us our confidence level.
Finally, as you can see, regularization not only affects the confidence, but also the orientation of the boundary.
5. How are these probabilities computed?
So how are these probabilities computed? Like the definite class predictions, they come from the raw model output.
The raw model output can be any number, but probabilities are numbers between 0 and 1. So we need a way to "squash" the raw model output to be between 0 and 1. The sigmoid function takes care of that for us.
Here's what it looks like.
Take a look at the curve: when the raw model output is zero, the probability is point-5 - this means we're right on the boundary. When the raw model output is positive, we would have predicted the positive class, and indeed the probability of the positive class approaches 1. When the raw model output is negative, we would have predicted the negative class, and indeed the probability of the positive class approaches 0, which is another way of saying that we're very confident it's the negative class.
Since the raw model output grows as we move away from the boundary, we're more confident in our predictions far away from the boundary.
6. Let's practice!
Time to explore these predicted probabilities.