1. Learn
  2. /
  3. Courses
  4. /
  5. Introduction to LLMs in Python

Connected

Exercise

Evaluating with ROUGE

ROUGE is commonly used to evaluate summarization tasks as it checks for similarities between predictions and references. You have been provided with a model-generated summary, predictions, and a references summary for validate. Calculate the scores to see how well the model performed.

The evaluate library has been loaded for you.

Instructions

100 XP
  • Load the ROUGE metric.
  • Calculate the ROUGE scores between the predicted and reference summaries.