MulaiMulai sekarang secara gratis

Using evaluate metrics

It's time to evaluate your LLM that classifies customer support interactions. Picking up from where you left your fine-tuned model, you'll now use a new validation dataset to assess the performance of your model.

Some interactions and their corresponding labels have been loaded for you as validate_text and validate_labels. The model and tokenizer are also loaded.

Latihan ini adalah bagian dari kursus

Introduction to LLMs in Python

Lihat Kursus

Petunjuk latihan

  • Extract the predicted labels from the model logits found in the outputs.
  • Compute the four loaded metrics by comparing real (validate_labels) and predicted labels.

Latihan interaktif praktis

Cobalah latihan ini dengan menyelesaikan kode contoh berikut.

accuracy = evaluate.load("accuracy")
precision = evaluate.load("precision")
recall = evaluate.load("recall")
f1 = evaluate.load("f1")

# Extract the new predictions
predicted_labels = ____

# Compute the metrics by comparing real and predicted labels
print(____(____=____, predictions=predicted_labels))
print(____(____=____, predictions=predicted_labels))
print(____(____=____, predictions=predicted_labels))
print(____(____=____, predictions=predicted_labels))
Edit dan Jalankan Kode