Zero-shot learning with CLIP
You will use zero-shot learning to classify an image from the rajuptvs/ecommerce_products_clip
dataset, which contains around 2k images of products along with associated descriptions:
The dataset (dataset
), CLIPProcessor (processor
), and CLIPModel (model
) have been loaded for you, as well as a list of categories:
categories = ["shirt", "trousers", "shoes", "dress", "hat",
"bag", "watch", "glasses", "jacket", "belt"]
This exercise is part of the course
Multi-Modal Models with Hugging Face
Exercise instructions
- Use the
processor
to preprocess thecategories
and the image at index999
ofdataset
; enable padding. - Pass the unpacked
inputs
into themodel
. - Calculate the probabilities of each category using the
.logits_per_image
attribute and the.softmax()
method. - Find the most likely category using
probs
andcategories
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Preprocess the categories and image
inputs = ____(text=____, images=____, return_tensors="pt", padding=____)
# Process the unpacked inputs with the model
outputs = ____
# Calculate the probabilities of each category
probs = outputs.____.____(dim=1)
# Find the most likely category
category = categories[probs.____.item()]
print(f"Predicted category: {category}")