Evaluating pretrained text generation model
The PyBooks team has used a pre-trained GPT-2 model that you experimented to generate a text based on a given prompt. Now, they want to evaluate the quality of this generated text. To achieve this, they have tasked you to evaluate generated text using a reference text.
BLEUScore
, ROUGEScore
have been loaded for you.
This exercise is part of the course
Deep Learning for Text with PyTorch
Exercise instructions
- Begin by initializing the two metrics (BLEU and ROUGE) provided from
torchmetrics.text
. - Use these initialized metrics to calculate the scores between the generated text and the reference text.
- Display the calculated BLEU and ROUGE scores.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
reference_text = "Once upon a time, there was a little girl who lived in a village near the forest."
generated_text = "Once upon a time, the world was a place of great beauty and great danger. The world of the gods was the place where the great gods were born, and where they were to live."
# Initialize BLEU and ROUGE scorers
bleu = ____()
rouge = ____()
# Calculate the BLEU and ROUGE scores
bleu_score = bleu([____], [[reference_text]])
rouge_score = rouge([generated_text], [[____]])
# Print the BLEU and ROUGE scores
print("BLEU Score:", bleu_score.____())
print("ROUGE Score:", rouge_score)