Automated caption quality assessment
You have accurately classified the image of the dress, but how good was the original description?
Maa Fab wrap with a Trendy design dress with Vibrant color for an elegant touch of Fabric completely Soft and Comfortable wear with amazing design of Solid Boat ? Neck Flared Dress to make a perfect addition to your wardrobe collection.
You will now use the CLIP model to make a quantitative statement about how accurate this description is using the CLIP score. The caption (description
), image (image
), ToTensor
class, and clip_score()
function from torchmetrics
have been loaded.
Diese Übung ist Teil des Kurses
Multi-Modal Models with Hugging Face
Anleitung zur Übung
- Convert the image to a PyTorch tensor with intensities ranging from 0-255.
- Use the
clip_score()
function to assess the quality of the caption by comparingimage
anddescription
with theopenai/clip-vit-base-patch32
model.
Interaktive Übung
Vervollständige den Beispielcode, um diese Übung erfolgreich abzuschließen.
# Convert the image to a PyTorch tensor
image = ____()(image)*____
# Use the clip_score function to assess the quality of the caption
score = ____(____, ____, "____")
print(f"CLIP score: {score}")