Using evaluate metrics
It's time to evaluate your LLM that classifies customer support interactions. Picking up from where you left your fine-tuned model, you'll now use a new validation dataset to assess the performance of your model.
Some interactions and their corresponding labels have been loaded for you as validate_text
and validate_labels
. The model
and tokenizer
are also loaded.
Este exercício faz parte do curso
Introduction to LLMs in Python
Instruções de exercício
- Extract the predicted labels from the model logits found in the
outputs
. - Compute the four loaded metrics by comparing real (
validate_labels
) and predicted labels.
Exercício interativo prático
Experimente este exercício preenchendo este código de exemplo.
accuracy = evaluate.load("accuracy")
precision = evaluate.load("precision")
recall = evaluate.load("recall")
f1 = evaluate.load("f1")
# Extract the new predictions
predicted_labels = ____
# Compute the metrics by comparing real and predicted labels
print(____(____=____, predictions=predicted_labels))
print(____(____=____, predictions=predicted_labels))
print(____(____=____, predictions=predicted_labels))
print(____(____=____, predictions=predicted_labels))