Calculating the confusion matrix

A confusion matrix (occasionally called a confusion table) is the basis of all performance metrics for models with a categorical response (such as a logistic regression). It contains the counts of each actual response-predicted response pair. In this case, where there are two possible responses (churn or not churn), there are four overall outcomes.

  1. True positive: The customer churned and the model predicted they would.
  2. False positive: The customer didn't churn, but the model predicted they would.
  3. True negative: The customer didn't churn and the model predicted they wouldn't.
  4. False negative: The customer churned, but the model predicted they wouldn't.

churn and mdl_churn_vs_relationship are available.

This exercise is part of the course

Introduction to Regression with statsmodels in Python

View Course

Exercise instructions

  • Get the actual responses by subsetting the has_churned column of the dataset. Assign to actual_response.
  • Get the "most likely" predicted responses from the model. Assign to predicted_response.
  • Create a DataFrame from actual_response and predicted_response. Assign to outcomes.
  • Print outcomes as a table of counts, representing the confusion matrix. This has been done for you.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Get the actual responses
actual_response = ____

# Get the predicted responses
predicted_response = ____

# Create outcomes as a DataFrame of both Series
outcomes = pd.DataFrame({____,
                         ____})

# Print the outcomes
print(outcomes.value_counts(sort = False))