Exercise

# Logistic regression algorithm

Let's dig into the internals and implement a logistic regression algorithm. Since R's `glm()`

function is very complex, you'll stick to a implementing simple logistic regression for a single dataset.

Rather than using sum of squares as the metric, we want to use likelihood. However, log-likelihood is more computationally stable, so we'll use that instead. Actually, there is one more change: since we want to maximize log-likelihood, but `optim()`

defaults to finding minimum values, it is easier to calculate the *negative* log-likelihood.

The log-likelihood value for each observation is $$ log(y_{pred}) * y_{actual} + log(1 - y_{pred}) * (1 - y_{actual}) $$

The metric to calculate is minus the sum of these log-likelihood contributions.

The explanatory values (the `time_since_last_purchase`

column of `churn`

) are available as `x_actual`

.
The response values (the `has_churned`

column of `churn`

) are available as `y_actual`

.

Instructions 1/3

**undefined XP**

- Set the intercept to one.
- Set the slope to
`0.5`

. - Calculate the predicted y-values as the intercept plus the slope times the actual x-values, all transformed with the logistic distribution CDF.
- Calculate the log-likelihood for each term as the log of the predicted y-values times the actual y-values, plus the log of one minus the predicted y-values times one minus the actual y-values.
- Calculate minus the sum of the log-likelihoods for each term.