Logistic regression algorithm
Let's dig into the internals and implement a logistic regression algorithm. Since statsmodels
's logit()
function is very complex, you'll stick to implementing simple logistic regression for a single dataset.
Rather than using sum of squares as the metric, we want to use likelihood. However, log-likelihood is more computationally stable, so we'll use that instead. Actually, there is one more change: since we want to maximize log-likelihood, but minimize()
defaults to finding minimum values, it is easier to calculate the negative log-likelihood.
The log-likelihood value for each observation is $$ log(y_{pred}) * y_{actual} + log(1 - y_{pred}) * (1 - y_{actual}) $$
The metric to calculate is the negative sum of these log-likelihood contributions.
The explanatory values (the time_since_last_purchase
column of churn
) are available as x_actual
.
The response values (the has_churned
column of churn
) are available as y_actual
.
logistic
is imported from scipy.stats
, and logit()
and minimize()
are also loaded.
Cet exercice fait partie du cours
Intermediate Regression with statsmodels in Python
Exercice interactif pratique
Essayez cet exercice en complétant cet exemple de code.
# Complete the function
def calc_neg_log_likelihood(coeffs):
# Unpack coeffs
____, ____ = ____
# Calculate predicted y-values
y_pred = ____
# Calculate log-likelihood
log_likelihood = ____
# Calculate negative sum of log_likelihood
neg_sum_ll = ____
# Return negative sum of log_likelihood
return ____
# Test the function with intercept 10 and slope 1
print(calc_neg_log_likelihood([10, 1]))