Evaluating with METEOR
METEOR excels at evaluating some of the more semantic features in text. It works similar to ROUGE by comparing a model-generated output to a reference output. You've been provided these texts as generated
and reference
; it's over to you to evaluate the score.
The evaluate
library has been loaded for you.
This exercise is part of the course
Introduction to LLMs in Python
Exercise instructions
- Compute and print the METEOR score.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
meteor = evaluate.load("meteor")
generated = ["The burrow stretched forward like a narrow corridor for a while, then plunged abruptly downward, so quickly that Alice had no chance to stop herself before she was tumbling into an extremely deep shaft."]
reference = ["The rabbit-hole went straight on like a tunnel for some way, and then dipped suddenly down, so suddenly that Alice had not a moment to think about stopping herself before she found herself falling down a very deep well."]
# Compute and print the METEOR score
results = ____
print("Meteor: ", ____)