Computing and describing predictions
1. Computing and describing predictions
Logistic regression models usually produce two types of predictions, the probability and the predicted class, 0 or 1. In practice, the predicted class is used more often. In this video, you will learn how to go from probabilities to classes and finally summarize the predictive power of the model using the confusion matrix.2. Computing predictions
Having a logistic regression model we can obtain the fitted values directly.3. Computing predictions
and using new measurements of x and the model formula we obtain predicted values.4. Computing predictions
Let's use the horseshoe crab model for illustration. Assume we measured a new crab at weight 2.85. Plugging this value in the model formula we obtain predicted probability of satellite at 0.814.5. Predictions in Python
In python we can compute predictions for a whole new set of measurements with one function predict. Note that the column names in dataframe new_data have to contain the model variables with same column names.6. From probabilities to classes
Having computed predicted probabilities we can now classify them using some specified cutoff value. Recall the figure from chapter 1 where we assumed a cutoff value at 0.5.7. Computing class predictions
Using the horseshoe crab model let's walk through the computation. First we extract estimated probabilities using fittedvalues function and convert into an array using values. Next, define the probability cutoff value at 0.4. Next, the prediction class is computed based on the cutoff, i.e. for all estimated values greater than 0.4 we classify as 1 and 0 otherwise.8. Computing class predictions
Finally, we count the number of observations in each class and report in the table. For cutoff at 0.4, there are 151 observations classified as 1 and 22 as 0. Applying different cut-off, say 0.5, provides for different output for the classes. Note that class predictions are greatly influenced by the proportion of events in the sample. For example, in samples where there is a very small number of events the model fit may never give estimated probabilities greater than 0.4 in which case we couldn't predict class 1.9. Confusion matrix
Once we have determined class predictions we can describe the performance of the model with the confusion matrix by comparing the number of predicted vs actual classes. Confusion matrix provides information on how many observations the model classified correctly and incorrectly.10. Confusion matrix - True Negatives
The correct classifications are TN (true negatives) where all actual nonevents are predicted as nonevents and11. Confusion matrix - True Positives
TP (true positives) where all actual events are predicted as events.12. Confusion matrix - False Positives
FP or false positives denote incorrectly predicting nonevent as an event.13. Confusion matrix - False Negatives
Similarly, FN or false negatives incorrectly predicted events as nonevents. In practice, we would like to find a balance between FPs and FNs by understanding the effect of different cutoffs and the costs associated with each error type.14. Confusion matrix in Python
In Python we can use pandas crosstab function which takes in the actual and predicted classes. We can also add rownames and columnnames as well as totals making the table more informative and easier to read. We can see that the model makes many false positive mistakes, namely 47 by incorrectly predicting the presence of satellite crab when there is none. Also, notice the imbalance between true negatives and true positives, where 104 true positives drive the overall accuracy of the model. When analyzing the confusion matrix you should remember to ask yourself whether the cut-off value is optimal, i.e. does it logically apply to my problem since changing the cutoff value will change the structure of the predicted class and hence the confusion matrix. For more details on the confusion matrix metrics, you can refer to other DataCamp's courses.15. Let's practice!
Now let's review the material with some practice problems!Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.