How logistic regression works

1. How logistic regression works

Let's see how logistic regression works. The principle is the same as for linear regression: choose a metric that measures how far the predicted responses are from the actual responses, and optimize that metric.

2. Sum of squares doesn't work

In the linear regression case, the metric to optimize was the sum of squares. That is, you calculated each predicted response minus the corresponding actual response, squared it, then took the sum. In the case of logistic regression, the actual response is always either zero or one, and the predicted response is between these two values. It turns out that the sum of squares metric optimizes poorly under these restrictions, and that there is a better metric.

3. Likelihood

This is the likelihood metric. Unlike the sum of squares, where the goal was to find the minimum possible value, with likelihood you want to find the maximum value. You take the product of the predicted and actual responses,

4. Likelihood

and add the product of one minus the predicted responses and one minus the actual responses.

5. Likelihood

Then you sum over all data points. Since the actual response only has two possible values, this equation simplifies in two different ways. When the actual response is one, the equation for each observation simplifies to the predicted response, y_pred. As y_pred increases, the metric increases too, and the maximum likelihood occurs when y_pred is one, the same as the actual value. When the actual response is zero, the equation simplifies to one minus the predicted response. As y_pred decreases, the metric increases, and the maximum likelihood occurs when y_pred is zero. In either case, you get a higher likelihood score when the predicted response is close to the actual response.

6. Log-likelihood

When calculating the likelihood, y_pred is often close to zero or one, which means you end up adding up lots of very small numbers, which introduces numerical error. It is more efficient to compute the log-likelihood. The only difference in this equation is that you take the logarithm of the predicted response terms. Optimizing to find the log-likelihood gives the same coefficients as optimizing to find the likelihood.

7. Negative log-likelihood

Since we want to maximize likelihood, but the optimize package can only minimize functions, one tweak to make is to calculate the negative log-likelihood. That is, include a minus sign when you calculate the sum of each observation's likelihood contribution.

8. Logistic regression algorithm

Now we are set to write our logistic regression algorithm. The metric function takes a coefficients argument, you extract the intercept and slope from it, and perform some more calculation that you'll see in the exercises. From scipy's optimize package, you'll then then use the minimize function to find the coefficients that minimize the metric.

9. Let's practice!

Almost there!