Loss function diagrams

1. Loss function diagrams

This video, on loss function diagrams, is probably the most conceptually difficult video in the course. But when we're done, we'll see exactly what logistic regression and linear SVMs are, which will lead us, in future chapters, to understand why they behave differently. Hang in there!

2. The raw model output

We want to draw loss functions, so let's set up a plot with loss on the vertical axis. On the horizontal axis we'll plot the raw model output. Since we predict using the sign of the raw model output, the plot is divided into two halves: in the left half we predict the one class (call it -1) and in the right half we predict the other class (call it +1). For concreteness, let's focus on a training example in class +1. Then, the right half represents correct predictions and the left half represents incorrect predictions.

3. 0-1 loss diagram

Let's now draw our 0-1 loss onto the picture. By the definition of the 0-1 loss, incorrect predictions, or mistakes, get a penalty of 1 and correct predictions get no penalty. It's important to distinguish this diagram from the decision boundary plots from earlier: here, the axes aren't two features, but rather the raw model output and the loss, regardless of how many features we have. Also, keep in mind that this picture is the loss for a particular training example: to get the whole loss, we need to sum up the contribution from all examples.

4. Linear regression loss diagram

Using this type of diagram, we can draw the loss from least squares linear regression. As the name implies, this is a squared or quadratic function. In linear regression, the raw model output is the prediction. Intuitively, the loss is higher as the prediction is further away from the true target value, which we're assuming is 1 in this case. While this intuition makes sense for linear regression, it doesn't make sense for a linear classifier: for us, being really close to the true value doesn't matter, as long as we get the sign right. We can see this problem in the picture. The left arm of the curve is OK: the loss is large for wrong answers. But the right arm is problematic: if the raw model output is large and positive, the loss grows large even though we're correctly predicting +1. Since we're fitting a model by minimizing the loss, this means perfectly good models are considered "bad" by the loss. This is why we need specialized loss functions for classification, and can't just use the squared error from linear regression. Let's now look at the logistic loss used in logistic regression.

5. Logistic loss diagram

This is the logistic loss. You can think of it as a smooth version of the 0-1 loss. It has the properties we want: as you move to the right, towards the zone of correct predictions, the loss goes down. We no longer have the interpretation as the number of mistakes, but, unlike the 0-1 loss, we can minimize this easily in practice.

6. Hinge loss diagram

This is the hinge loss, used in SVMs.

7. Hinge loss diagram

The general shape is the same as for the logistic loss: both penalize incorrect predictions. These loss function diagrams capture the essence of logistic regression and SVMs. There are certainly a lot more details to come but we've made a lot of progress, too.

8. Let's practice!

Let's delve into these new loss functions.

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.