Calculating the confusion matrix
A confusion matrix (occasionally called a confusion table) is the basis of all performance metrics for models with a categorical response (such as a logistic regression). It contains the counts of each actual response-predicted response pair. In this case, where there are two possible responses (churn or not churn), there are four overall outcomes.
- True positive: The customer churned and the model predicted they would.
- False positive: The customer didn't churn, but the model predicted they would.
- True negative: The customer didn't churn and the model predicted they wouldn't.
- False negative: The customer churned, but the model predicted they wouldn't.
churn
and mdl_churn_vs_relationship
are available.
This exercise is part of the course
Introduction to Regression with statsmodels in Python
Exercise instructions
- Get the actual responses by subsetting the
has_churned
column of the dataset. Assign toactual_response
. - Get the "most likely" predicted responses from the model. Assign to
predicted_response
. - Create a DataFrame from
actual_response
andpredicted_response
. Assign tooutcomes
. - Print
outcomes
as a table of counts, representing the confusion matrix. This has been done for you.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Get the actual responses
actual_response = ____
# Get the predicted responses
predicted_response = ____
# Create outcomes as a DataFrame of both Series
outcomes = pd.DataFrame({____,
____})
# Print the outcomes
print(outcomes.value_counts(sort = False))