1. Learn
  2. /
  3. Courses
  4. /
  5. Multi-Modal Models with Hugging Face

Connected

Exercise

Zero-shot learning with CLIP

You will use zero-shot learning to classify an image from the rajuptvs/ecommerce_products_clip dataset, which contains around 2k images of products along with associated descriptions:

Image of a woman modeling a dress

The dataset (dataset), CLIPProcessor (processor), and CLIPModel (model) have been loaded for you, as well as a list of categories:

categories = ["shirt", "trousers", "shoes", "dress", "hat", 
              "bag", "watch", "glasses", "jacket", "belt"]

Instructions

100 XP
  • Use the processor to preprocess the categories and the image at index 999 of dataset; enable padding.
  • Pass the unpacked inputs into the model.
  • Calculate the probabilities of each category using the .logits_per_image attribute and the .softmax() method.
  • Find the most likely category using probs and categories.