Get startedGet started for free

Predictions and odds

1. Predictions and odds

Let's see how to make predictions with your logistic regression model.

2. The regplot() predictions

You've already seen how regplot will give you a logistic regression trend line.

3. Making predictions

To make a prediction with a logistic model, you use the same technique as for linear models. Create a DataFrame of explanatory variable values. Then add a response column calculated using the predict method.

4. Adding point predictions

As with the linear case, we can add those predictions onto the plot by creating a scatter plot with prediction_data as the data argument. As expected, these points follow the trend line.

5. Getting the most likely outcome

One simpler prediction you can make, rather than calculating probabilities of a response, is to calculate the most likely response. That is, if the probability of churning is less than 0-point-5, the most likely outcome is that they won't churn. If their probability is greater then 0-point-5, it's more likely that they will churn. To calculate this, simply round the predicted probabilities using numpy's round() function.

6. Visualizing most likely outcome

We can plot the most likely outcome by using the prediction data with the numbers we just calculated. For recently active customers, the most likely outcome is that they don't churn. Otherwise, the most likely outcome is that they churn.

7. Odds

There is another way to talk about binary responses, commonly used in gambling. The odds are the probability that something happens, divided by the probability that it doesn't. For example, a probability of zero-point-two-five is the same as the odds of "three to one against", because the probability of the event not happening is zero-point-seven-five, which is three times as much. The plot shows the relationship between the two terms.

8. Calculating odds

We can calculate the odds by dividing the predicted response probability by one minus that number.

9. Visualizing odds

It doesn't make sense to visualize odds with the original data points, so we need a new plot. To create a plot with a continuous line, we can use seaborn's lineplot function. Here, the dotted line where the odds is one indicates where churning is just as likely as not churning. This has been added by using the axhline function. In the bottom-left, the predictions are below one, so the chance of churning is less than the chance of not churning. In the top-right, the chance of churning is about five times more than the chance of not churning.

10. Visualizing log odds

One nice property of logistic regression odds is that on a log-scale, they change linearly with the explanatory variable. This plot adds a logarithmic y scale.

11. Calculating log odds

This nice property of the logarithm of odds means log-odds is another common way of describing logistic regression predictions. In fact, the log-odds is also known as the logit, hence the name of the function you've been using to model logistic regression.

12. All predictions together

Here are all the values calculated in the prediction dataset. Some column names are abbreviated for better printing.

13. Comparing scales

Each way of describing responses has different benefits. Most likely outcome is easiest to understand because the answer is always yes or no, but this lacks precision. Probabilities and odds are still fairly easy to understand for a data literate audience. However, the non-linear predictions make it hard to reason about how changes in the explanatory variable will change the response. Log odds is difficult to interpret for individual values, but the linear relationship with the explanatory variables makes it easy to reason about changes.

14. Let's practice!

Let's make some predictions.

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.