1. Learn
  2. /
  3. Courses
  4. /
  5. Multi-Modal Models with Hugging Face

Connected

Exercise

Automated caption quality assessment

You have accurately classified the image of the dress, but how good was the original description?

Maa Fab wrap with a Trendy design dress with Vibrant color for an elegant touch of Fabric completely Soft and Comfortable wear with amazing design of Solid Boat ? Neck Flared Dress to make a perfect addition to your wardrobe collection.

You will now use the CLIP model to make a quantitative statement about how accurate this description is using the CLIP score. The caption (description), image (image), ToTensor class, and clip_score() function from torchmetrics have been loaded.

Instructions

100 XP
  • Convert the image to a PyTorch tensor with intensities ranging from 0-255.
  • Use the clip_score() function to assess the quality of the caption by comparing image_tensor and description with the openai/clip-vit-base-patch32 model.