Get startedGet started for free

Optimizing the threshold

You heard that the default value of 0.5 maximizes accuracy in theory, but you want to test what happens in practice. So you try out a number of different threshold values, to see what accuracy you get, and hence determine the best-performing threshold value. You repeat this experiment for the F1 score. Is 0.5 the optimal threshold? Is the optimal threshold for accuracy and for the F1 score the same? Go ahead and find out! You have a scores matrix available, obtained by scoring the test data. The ground truth labels for the test data is also available as y_test. Finally, two numpy functions are preloaded, argmin() and argmax(), which retrieve the index of the minimum and maximum values in an array respectively, in addition to the metrics accuracy_score() and f1_score().

This exercise is part of the course

Designing Machine Learning Workflows in Python

View Course

Exercise instructions

  • Create a range of threshold values that include 0.0, 0.25, 0.5, 0.75 and 1.0.
  • Via double list comprehension, store the predictions for each threshold value in the range above. Recall that obtaining labels for a scores matrix using a threshold thr is possible using [s[1] > thr for s in scores].
  • Run through that list and compute the accuracy for each threshold. Repeat for the F1 score.
  • Using either argmin() or argmax(), find the optimal threshold for accuracy, and for F1.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Create a range of equally spaced threshold values
t_range = ____

# Store the predicted labels for each value of the threshold
preds = [[____ > thr for s in scores] for ____ in ____]

# Compute the accuracy for each threshold
accuracies = [____(____, ____) for p in preds]

# Compute the F1 score for each threshold
f1_scores = [____(____, ____) for p in preds]

# Report the optimal threshold for accuracy, and for F1
print(t_range[____(accuracies)], t_range[____(f1_scores)])
Edit and Run Code