Session Ready
Exercise

Logistic regression algorithm

Let's dig into the internals and implement a logistic regression algorithm. Since statsmodels's logit() function is very complex, you'll stick to implementing simple logistic regression for a single dataset.

Rather than using sum of squares as the metric, we want to use likelihood. However, log-likelihood is more computationally stable, so we'll use that instead. Actually, there is one more change: since we want to maximize log-likelihood, but minimize() defaults to finding minimum values, it is easier to calculate the negative log-likelihood.

The log-likelihood value for each observation is $$ log(y_{pred}) * y_{actual} + log(1 - y_{pred}) * (1 - y_{actual}) $$

The metric to calculate is the negative sum of these log-likelihood contributions.

The explanatory values (the time_since_last_purchase column of churn) are available as x_actual. The response values (the has_churned column of churn) are available as y_actual. logistic is imported from scipy.stats, and logit() and minimize() are also loaded.

Instructions 1/2
undefined XP
  • 1
  • 2

Complete the function body.

  • Unpack coeffs to intercept and slope, respectively.
  • Calculate the predicted y-values as the intercept plus the slope times the actual x-values, transformed with the logistic CDF.
  • Calculate the log-likelihood as the log of the predicted y-values times the actual y-values, plus the log of one minus the predicted y-values times one minus the actual y-values.
  • Calculate the negative sum of the log-likelihood.
  • Return the negative sum of the log-likelihood.