ComeçarComece gratuitamente

Using evaluate metrics

It's time to evaluate your LLM that classifies customer support interactions. Picking up from where you left your fine-tuned model, you'll now use a new validation dataset to assess the performance of your model.

Some interactions and their corresponding labels have been loaded for you as validate_text and validate_labels. The model and tokenizer are also loaded.

Este exercício faz parte do curso

Introduction to LLMs in Python

Ver Curso

Instruções de exercício

  • Extract the predicted labels from the model logits found in the outputs.
  • Compute the four loaded metrics by comparing real (validate_labels) and predicted labels.

Exercício interativo prático

Experimente este exercício preenchendo este código de exemplo.

accuracy = evaluate.load("accuracy")
precision = evaluate.load("precision")
recall = evaluate.load("recall")
f1 = evaluate.load("f1")

# Extract the new predictions
predicted_labels = ____

# Compute the metrics by comparing real and predicted labels
print(____(____=____, predictions=predicted_labels))
print(____(____=____, predictions=predicted_labels))
print(____(____=____, predictions=predicted_labels))
print(____(____=____, predictions=predicted_labels))
Editar e executar código