Session Ready
Exercise

Optimizing the threshold

You heard that the default value of 0.5 maximizes accuracy in theory, but you want to test what happens in practice. So you try out a number of different threshold values, to see what accuracy you get, and hence determine the best-performing threshold value. You repeat this experiment for the F1 score. Is 0.5 the optimal threshold? Is the optimal threshold for accuracy and for the F1 score the same? Go ahead and find out! You have a scores matrix available, obtained by scoring the test data. The ground truth labels for the test data is also available as y_test. Finally, two numpy functions are preloaded, argmin() and argmax(), which retrieve the index of the minimum and maximum values in an array respectively, in addition to the metrics accuracy_score() and f1_score().

Instructions
100 XP
  • Create a range of threshold values that include 0.0, 0.25, 0.5, 0.75 and 1.0.
  • Via double list comprehension, store the predictions for each threshold value in the range above. Recall that obtaining labels for a scores matrix using a threshold thr is possible using [s[1] > thr for s in scores].
  • Run through that list and compute the accuracy for each threshold. Repeat for the F1 score.
  • Using either argmin() or argmax(), find the optimal threshold for accuracy, and for F1.