Linear classifiers: the coefficients

1. Linear classifiers: prediction equations

Welcome to Chapter 2. This chapter is much more conceptual than the other chapters, because we'll be laying the foundation for understanding logistic regression and SVMs. We'll start off by exploring some math behind linear classifiers in this video. By really digging into the details, you'll be better equipped to compare these classifiers to other models and interpret the results.

2. Dot Products

We'll start by defining a dot product. Let's create some numpy arrays x and y. To take the dot product between them, we need to multiply them element-wise. The result is 0 (from 0 times 3), 4 (from 1 times 4), and 10 (from 2 times 5). The sum of these numbers, also known as the dot product, is 14. A convenient notation for this in recent Python versions is the "at" symbol. x@y gives us the same result. In math notation, this is written "x dot y". You can think of a dot product as multiplication in higher dimensions, since x and y are arrays of values.

3. Linear classifier prediction

Using dot products, we can express how linear classifiers make predictions. First, we compute what we'll call the "raw model output", which is the dot product of the coefficients and the features, plus an intercept. We'll then take the sign of this quantity, in other words, we'll check if it's positive or negative. This is a key equation in the course. Crucially, this pattern is the same for both logistic regression and linear SVMs. In scikit-learn terms, we can say logistic regression and linear SVM have different fit functions but the same predict function. The differences in "fit" relate to loss functions, which are coming later in this chapter.

4. How LogisticRegression makes predictions

Let's see this equation in action with scikit-learn's breast cancer classification data set. We create a logistic regression object, fit it to the data, and look at the predictions on examples 10 and 20, which are 0 and 1.

5. How LogisticRegression makes predictions (cont.)

Let's now dig deeper. We can get the learned coefficients and intercept with lr.coef and lr.intercept. Let's compute the raw model output for example 10. It's negative: that's why we predict the negative class, called "0" in this data set. On the other hand, for example 20 the raw model output is positive: so we predict the other class, called "1" in this data set. In general, this is what the predict function does for any X: it computes the raw model output, checks if it's positive or negative, and then returns a result based on the names of the classes in your data set, in this case 0 and 1.

6. The raw model output

Let's look at our prediction equation visually. This figure shows an example in 2 dimensions, with the raw model output labeled at a few locations. As we move away from the boundary on one side, the output becomes more and more negative. On the other side, it becomes more and more positive. So the sign, positive or negative, tells you what side of the decision boundary you're on, and thus your prediction. Along the decision boundary itself, the raw model output is zero. Furthermore, the values of the coefficients and intercept determine the boundary.

7. The raw model output

For example, here I changed the intercept of the boundary. The raw model output at the same 3 points has changed because we changed the intercept. The boundary shifted down and left. To change the orientation of the boundary, we can change the coefficients.

8. The raw model output

Here we're looking at different coefficients. This changes the orientation of the decision boundary. In fact, the three points we were looking at are now all along the boundary.

9. Let's practice!

Let's continue exploring these effects.