Logistic regression algorithm

Let's dig into the internals and implement a logistic regression algorithm. Since statsmodels's logit() function is very complex, you'll stick to implementing simple logistic regression for a single dataset.

Rather than using sum of squares as the metric, we want to use likelihood. However, log-likelihood is more computationally stable, so we'll use that instead. Actually, there is one more change: since we want to maximize log-likelihood, but minimize() defaults to finding minimum values, it is easier to calculate the negative log-likelihood.

The log-likelihood value for each observation is $$ log(y_{pred}) * y_{actual} + log(1 - y_{pred}) * (1 - y_{actual}) $$

The metric to calculate is the negative sum of these log-likelihood contributions.

The explanatory values (the time_since_last_purchase column of churn) are available as x_actual. The response values (the has_churned column of churn) are available as y_actual. logistic is imported from scipy.stats, and logit() and minimize() are also loaded.

Cet exercice fait partie du cours

Intermediate Regression with statsmodels in Python

Afficher le cours

Exercice interactif pratique

Essayez cet exercice en complétant cet exemple de code.

# Complete the function
def calc_neg_log_likelihood(coeffs):
    # Unpack coeffs
    ____, ____ = ____
    # Calculate predicted y-values
    y_pred = ____
    # Calculate log-likelihood
    log_likelihood = ____
    # Calculate negative sum of log_likelihood
    neg_sum_ll = ____
    # Return negative sum of log_likelihood
    return ____

# Test the function with intercept 10 and slope 1
print(calc_neg_log_likelihood([10, 1]))

Modifier et exécuter le code