Aan de slagGa gratis aan de slag

Using evaluate metrics

It's time to evaluate your LLM that classifies customer support interactions. Picking up from where you left your fine-tuned model, you'll now use a new validation dataset to assess the performance of your model.

Some interactions and their corresponding labels have been loaded for you as validate_text and validate_labels. The model and tokenizer are also loaded.

Deze oefening maakt deel uit van de cursus

Introduction to LLMs in Python

Cursus bekijken

Oefeninstructies

  • Extract the predicted labels from the model logits found in the outputs.
  • Compute the four loaded metrics by comparing real (validate_labels) and predicted labels.

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

accuracy = evaluate.load("accuracy")
precision = evaluate.load("precision")
recall = evaluate.load("recall")
f1 = evaluate.load("f1")

# Extract the new predictions
predicted_labels = ____

# Compute the metrics by comparing real and predicted labels
print(____(____=____, predictions=predicted_labels))
print(____(____=____, predictions=predicted_labels))
print(____(____=____, predictions=predicted_labels))
print(____(____=____, predictions=predicted_labels))
Code bewerken en uitvoeren