Automated caption quality assessment
You have accurately classified the image of the dress, but how good was the original description?
Maa Fab wrap with a Trendy design dress with Vibrant color for an elegant touch of Fabric completely Soft and Comfortable wear with amazing design of Solid Boat ? Neck Flared Dress to make a perfect addition to your wardrobe collection.
You will now use the CLIP model to make a quantitative statement about how accurate this description is using the CLIP score. The caption (description
), image (image
), ToTensor
class, and clip_score()
function from torchmetrics
have been loaded.
This exercise is part of the course
Multi-Modal Models with Hugging Face
Exercise instructions
- Convert the image to a PyTorch tensor with intensities ranging from 0-255.
- Use the
clip_score()
function to assess the quality of the caption, using theopenai/clip-vit-base-patch32
model.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Convert the image to a PyTorch tensor
image = ____()(____)____
# Use the clip_score function to assess the quality of the caption
score = ____
print(f"CLIP score: {score}")