1. Using the logistic regression model
Now that you know how to construct a logistic regression model, it is time to learn how to make predictions with the logistic regression model.
2. The logistic regression function
In the programming exercise, you constructed a model that predicts who will donate using three predictors. The formula derived is given here. Assume you want to predict for a female donor aged 72 that donated 120 days ago, how likely it is that she will donate for the new campaign.
Recall that a logistic regression model is a linear regression formula wrapped in a logit function. So all you need to do is replace the predictors with the given values, and then put the result in the logit function.
As the donor is female, gender_F is one, and also the other variables are given. The result of filling out the variables in the regression function is -1-point-45. Taking the logit of this number gives 0-point-19, which means that there is a 19% chance that this donor will donate for the next campaign.
3. Making predictions in Python
Fortunately, you don't need to calculate the predicted probabilities manually in Python.
Consider again the 72 year old lady that donated 120 days ago. If we collect her data in a list, making sure we add the values in the same order as they appear in the logistic regression model, you can calculate the prediction by feeding this list as a parameter to the predict_proba method on the logreg object.
The output is an array that has two numbers. The first number is the probability that the donor will not donate (target 0) and the second number is the probability that the donor will donate (target 1). This last number is the one that we are interested in.
The probability that this donor will donate, 18%, seems pretty low. However, it is quite high if you compare it to 5%, the overall chance that someone in the population will donate.
4. Making predictions in Python
Often, you are interested in making predictions for a large group of people. For instance, if you want to decide which donors to send a letter to, you might want to make predictions for the entire population, and send a letter to the donors with the highest probabilities only.
Assume that all up-to-date information about the donors is in a DataFrame called current_data. You can use indexing to select the relevant columns only, namely the predictors that appear in the model. In Python, you can feed this DataFrame to the predict_proba function, to obtain a prediction for each observation in the DataFrame. These predictions are now ready for use, for instance to decide which donors to send a letter to.
5. Let's practice!
Now it's your turn. Let's use the logistic regression model to make predictions!