Calculating the confusion matrix
A confusion matrix (occasionally called a confusion table) is the basis of all performance metrics for models with a categorical response (such as a logistic regression). It contains the counts of each actual response-predicted response pair. In this case, where there are two possible responses (churn or not churn), there are four overall outcomes.
- The customer churned and the model predicted that.
- The customer churned but the model didn't predict that.
- The customer didn't churn but the model predicted they did.
- The customer didn't churn and the model predicted that.
churn
and mdl_churn_vs_relationship
are available.
This exercise is part of the course
Introduction to Regression in R
Exercise instructions
- Get the actual responses from the
has_churned
column of the dataset. Assign toactual_response
. - Get the "most likely" predicted responses from the model. Assign to
predicted_response
. - Create a table of counts from the actual and predicted response vectors. Assign to
outcomes
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Get the actual responses from the dataset
actual_response <- ___
# Get the "most likely" responses from the model
predicted_response <- ___
# Create a table of counts
outcomes <- ___
# See the result
outcomes