MulaiMulai sekarang secara gratis

Optimizing the threshold

You heard that the default value of 0.5 maximizes accuracy in theory, but you want to test what happens in practice. So you try out a number of different threshold values, to see what accuracy you get, and hence determine the best-performing threshold value. You repeat this experiment for the F1 score. Is 0.5 the optimal threshold? Is the optimal threshold for accuracy and for the F1 score the same? Go ahead and find out! You have a scores matrix available, obtained by scoring the test data. The ground truth labels for the test data is also available as y_test. Finally, two numpy functions are preloaded, argmin() and argmax(), which retrieve the index of the minimum and maximum values in an array respectively, in addition to the metrics accuracy_score() and f1_score().

Latihan ini adalah bagian dari kursus

Designing Machine Learning Workflows in Python

Lihat Kursus

Petunjuk latihan

  • Create a range of threshold values that include 0.0, 0.25, 0.5, 0.75 and 1.0.
  • Via double list comprehension, store the predictions for each threshold value in the range above. Recall that obtaining labels for a scores matrix using a threshold thr is possible using [s[1] > thr for s in scores].
  • Run through that list and compute the accuracy for each threshold. Repeat for the F1 score.
  • Using either argmin() or argmax(), find the optimal threshold for accuracy, and for F1.

Latihan interaktif praktis

Cobalah latihan ini dengan menyelesaikan kode contoh berikut.

# Create a range of equally spaced threshold values
t_range = ____

# Store the predicted labels for each value of the threshold
preds = [[____ > thr for s in scores] for ____ in ____]

# Compute the accuracy for each threshold
accuracies = [____(____, ____) for p in preds]

# Compute the F1 score for each threshold
f1_scores = [____(____, ____) for p in preds]

# Report the optimal threshold for accuracy, and for F1
print(t_range[____(accuracies)], t_range[____(f1_scores)])
Edit dan Jalankan Kode