Get startedGet started for free

Using evaluate metrics

It's time to evaluate your LLM that classifies customer support interactions. Picking up from where you left your fine-tuned model, you'll now use a new validation dataset to assess the performance of your model.

Some interactions and their corresponding labels have been loaded for you as validate_text and validate_labels. The model and tokenizer are also loaded.

This exercise is part of the course

Introduction to LLMs in Python

View Course

Exercise instructions

  • Extract the predicted labels from the model logits found in the outputs.
  • Compute the four loaded metrics by comparing real (validate_labels) and predicted labels.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

accuracy = evaluate.load("accuracy")
precision = evaluate.load("precision")
recall = evaluate.load("recall")
f1 = evaluate.load("f1")

# Extract the new predictions
predicted_labels = ____

# Compute the metrics by comparing real and predicted labels
print(____(____=____, predictions=predicted_labels))
print(____(____=____, predictions=predicted_labels))
print(____(____=____, predictions=predicted_labels))
print(____(____=____, predictions=predicted_labels))
Edit and Run Code