1. Learn
  2. /
  3. Courses
  4. /
  5. Multi-Modal Models with Hugging Face

Connected

Exercise

Pipeline caption generation

In this exercise, you'll again use flickr dataset, which has 30,000 images and associated captions. Now you'll generate a caption for the following image using a pipeline instead of the auto classes.

Photo of a man standing on a ladder cleaning a window

The dataset (dataset) has been loaded with the following structure:

Dataset({
    features: ['image', 'caption', 'sentids', 'split', 'img_id', 'filename'],
    num_rows: 10
})

The pipeline module (pipeline) has been loaded.

Instructions

100 XP
  • Load the image-to-text pipeline with Salesforce/blip-image-captioning-base pretrained model.
  • Use the pipeline to generate a caption for the image at index 3.