Accuracy and loss functions

A simple measure of performance in binary classification is accuracy: the average number of correctly classified observations.

Classification methods such as logistic regression aim to (approximately) minimize the incorrectly classified observations. The mean of incorrectly classified observations can be thought of as a penalty (loss) function for the classifier. Less penalty = good.

Since we know how to make predictions with our model, we can also compute the average number of incorrect predictions.

This exercise is part of the course

Helsinki Open Data Science

View Course

Exercise instructions

Define the loss function loss_func
Execute the call to the loss function with prob = 0, meaning you define the probability of high_use as zero for each individual. What is the interpretation of the resulting proportion?
Adjust the code: change the prob argument in the loss function to prob = 1. What kind of a prediction does this equal to? What is the interpretation of the resulting proportion?
Adjust the code again: change the prob argument by giving it the prediction probabilities in alc (the column probability). What is the interpretation of the resulting proportion?

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# the logistic regression model m and dataset alc with predictions are available

# define a loss function (mean prediction error)
loss_func <- function(class, prob) {
  n_wrong <- abs(class - prob) > 0.5
  mean(n_wrong)
}

# call loss_func to compute the average number of wrong predictions in the (training) data
loss_func(class = alc$high_use, prob = 0)

Edit and Run Code