Precision-Recall trade-off
When working with classification tasks, the term Precision-Recall trade-off often appears. Where does it comes from?
Usually, the class with higher probability is chosen to assign the document to. But, what if the maximum probability is equal to 0.1
? Should you consider that document to belong to this class with only 10% probability?
The answer varies according to problem at hand. It is possible to add a minimum threshold to accept the classification, and by changing the threshold the values of precision and recall move in opposite directions.
The variables y_true
and the model model
are loaded. Also, if the probability is lower than the threshold, the document will be assigned to DEFAULT_CLASS
(chosen to be class 2
).
This exercise is part of the course
Recurrent Neural Networks (RNNs) for Language Modeling with Keras
Exercise instructions
- Use the
.predict()
method to get the probabilities for each class and store them in thepred_probabilities
variable. - Accept the maximum probability only if it is greater than or equal to
0.5
and store the results in they_pred_50
variable. - Use the
np.argmax()
andnp.max()
functions to do the same for a threshold equal to0.8
. - Print the
trade_off
variable with all the metrics.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Get probabilities for each class
pred_probabilities = model.____(X_test)
# Thresholds at 0.5 and 0.8
y_pred_50 = [np.argmax(x) if np.max(x) >= ____ else DEFAULT_CLASS for x in pred_probabilities]
y_pred_80 = [np.____(x) if np.____(x) >= 0.8 else DEFAULT_CLASS for x in pred_probabilities]
trade_off = pd.DataFrame({
'Precision_50': precision_score(y_true, y_pred_50, average=None),
'Precision_80': precision_score(y_true, y_pred_80, average=None),
'Recall_50': recall_score(y_true, y_pred_50, average=None),
'Recall_80': recall_score(y_true, y_pred_80, average=None)},
index=['Class 1', 'Class 2', 'Class 3'])
____