1. Predictions and odds ratios
Let's see how to make predictions with your logistic regression model.
2. The ggplot predictions
You've already seen how ggplot2 will give you a glm prediction line.
3. Making predictions
To make a prediction with a logistic model, you use the same technique as for linear models.
Create a data frame or tibble of explanatory variable values.
Then add a response column calculated using predict. There is one change here. As well as passing the model object and the explanatory data to predict, you also need to set the type argument to "response" to get the probabilities of churning.
4. Adding point predictions
As with the linear case, we can add those predictions onto the plot by setting the data argument of geom_point to the prediction data frame. As expected, these points follow the trend line.
5. Getting the most likely outcome
One simpler prediction you can make, rather than calculating probabilities of a response, is to calculate the most likely response. That is, if the probability of churning is less than 0-point-5, the most likely outcome is that they won't churn. If their probability is greater then 0-point-5, it's more likely that they will churn.
To calculate this, simply round the predicted probabilities.
6. Visualizing most likely outcome
We can plot the most likely outcome by using the prediction data and overriding the y aesthetic to use the numbers we just calculated. For recently active customers, the most likely outcome is that they don't churn. Otherwise, the most likely outcome is that they churn.
7. Odds ratios
There is another way to talk about binary responses, commonly used in gambling. The odds ratio is the probability that something happens, divided by the probability that it doesn't.
For example, a probability of zero-point-two-five is the same as the odds of "three to one against", because the probability of the event not happening is zero-point-seven-five, which is three times as much.
The plot shows the relationship between the two terms.
8. Calculating odds ratio
We can calculate the odds ratio by dividing the predicted response probability by one minus that number.
9. Visualizing odds ratio
It doesn't make sense to visualize odds with the original data points, so we need a new plot.
Here, the dotted line where the odds ratio is one indicates where churning is just as likely as not churning. In the bottom-left, the predictions are below one, so the chance of churning is less than the chance of not churning. In the top-right, the chance of churning is about five times more than the chance of not churning.
10. Visualizing log odds ratio
One nice property of logistic regression odds ratios is that on a log-scale, they change linearly with the explanatory variable. This plot adds scale_y_log10.
11. Calculating log odds ratio
This nice property of the logarithm of odds ratios means log-odds ratio is another common way of describing logistic regression predictions. In fact, predict will return the log odds ratio if you don't specify the type argument.
Compare the two different ways that the log odds ratio is calculated here, and make sure you understand the code.
12. All predictions together
Here are all the values calculated in the prediction dataset. Some column names are abbreviated for better printing. Notice that the log odds ratio is the same in both cases.
13. Comparing scales
Each way of describing responses has different benefits.
Most likely outcome is easiest to understand because the answer is always yes or no, but this lacks precision.
Probabilities and odds ratios are still fairly easy to understand for a data literate audience. However, the non-linear predictions make it hard to reason about how changes in the explanatory variable will change the response.
Log odds ratio is difficult to interpret for individual values, but the linear relationship with the explanatory variables makes it easy to reason about changes.
14. Let's practice!
Let's make some predictions.