Logistic regression algorithm

Let's dig into the internals and implement a logistic regression algorithm. Since statsmodels's logit() function is very complex, you'll stick to implementing simple logistic regression for a single dataset.

Rather than using sum of squares as the metric, we want to use likelihood. However, log-likelihood is more computationally stable, so we'll use that instead. Actually, there is one more change: since we want to maximize log-likelihood, but minimize() defaults to finding minimum values, it is easier to calculate the negative log-likelihood.

The log-likelihood value for each observation is $$ log(y_{pred}) * y_{actual} + log(1 - y_{pred}) * (1 - y_{actual}) $$

The metric to calculate is the negative sum of these log-likelihood contributions.

The explanatory values (the time_since_last_purchase column of churn) are available as x_actual. The response values (the has_churned column of churn) are available as y_actual. logistic is imported from scipy.stats, and logit() and minimize() are also loaded.

Este ejercicio forma parte del curso

Intermediate Regression with statsmodels in Python

Ver curso

Ejercicio interactivo práctico

Prueba este ejercicio y completa el código de muestra.

# Complete the function
def calc_neg_log_likelihood(coeffs):
    # Unpack coeffs
    ____, ____ = ____
    # Calculate predicted y-values
    y_pred = ____
    # Calculate log-likelihood
    log_likelihood = ____
    # Calculate negative sum of log_likelihood
    neg_sum_ll = ____
    # Return negative sum of log_likelihood
    return ____

# Test the function with intercept 10 and slope 1
print(calc_neg_log_likelihood([10, 1]))

Editar y ejecutar código