Get startedGet started for free

Automated caption quality assessment

You have accurately classified the image of the dress, but how good was the original description?

Maa Fab wrap with a Trendy design dress with Vibrant color for an elegant touch of Fabric completely Soft and Comfortable wear with amazing design of Solid Boat ? Neck Flared Dress to make a perfect addition to your wardrobe collection.

You will now use the CLIP model to make a quantitative statement about how accurate this description is using the CLIP score. The caption (description), image (image), ToTensor class, and clip_score() function from torchmetrics have been loaded.

This exercise is part of the course

Multi-Modal Models with Hugging Face

View Course

Exercise instructions

  • Convert the image to a PyTorch tensor with intensities ranging from 0-255.
  • Use the clip_score() function to assess the quality of the caption, using the openai/clip-vit-base-patch32 model.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Convert the image to a PyTorch tensor
image = ____()(____)____

# Use the clip_score function to assess the quality of the caption
score = ____

print(f"CLIP score: {score}")
Edit and Run Code