Calculating the confusion matrix

A confusion matrix (occasionally called a confusion table) is the basis of all performance metrics for models with a categorical response (such as a logistic regression). It contains the counts of each actual response-predicted response pair. In this case, where there are two possible responses (churn or not churn), there are four overall outcomes.

True positive: The customer churned and the model predicted they would.
False positive: The customer didn't churn, but the model predicted they would.
True negative: The customer didn't churn and the model predicted they wouldn't.
False negative: The customer churned, but the model predicted they wouldn't.

churn and mdl_churn_vs_relationship are available.

Deze oefening maakt deel uit van de cursus

Introduction to Regression with statsmodels in Python

Cursus bekijken

Oefeninstructies

Get the actual responses by subsetting the has_churned column of the dataset. Assign to actual_response.
Get the "most likely" predicted responses from the model. Assign to predicted_response.
Create a DataFrame from actual_response and predicted_response. Assign to outcomes.
Print outcomes as a table of counts, representing the confusion matrix. This has been done for you.

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

# Get the actual responses
actual_response = ____

# Get the predicted responses
predicted_response = ____

# Create outcomes as a DataFrame of both Series
outcomes = pd.DataFrame({____,
                         ____})

# Print the outcomes
print(outcomes.value_counts(sort = False))

Code bewerken en uitvoeren