Accuracy and loss functions
A simple measure of performance in binary classification is accuracy: the average number of correctly classified observations.
Classification methods such as logistic regression aim to (approximately) minimize the incorrectly classified observations. The mean of incorrectly classified observations can be thought of as a penalty (loss) function for the classifier. Less penalty = good.
Since we know how to make predictions with our model, we can also compute the average number of incorrect predictions.
This exercise is part of the course
Helsinki Open Data Science
Exercise instructions
- Define the loss function
loss_func
- Execute the call to the loss function with
prob = 0
, meaning you define the probability ofhigh_use
as zero for each individual. What is the interpretation of the resulting proportion? - Adjust the code: change the
prob
argument in the loss function toprob = 1
. What kind of a prediction does this equal to? What is the interpretation of the resulting proportion? - Adjust the code again: change the
prob
argument by giving it the prediction probabilities inalc
(the columnprobability
). What is the interpretation of the resulting proportion?
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# the logistic regression model m and dataset alc with predictions are available
# define a loss function (mean prediction error)
loss_func <- function(class, prob) {
n_wrong <- abs(class - prob) > 0.5
mean(n_wrong)
}
# call loss_func to compute the average number of wrong predictions in the (training) data
loss_func(class = alc$high_use, prob = 0)