Get startedGet started for free

Pipeline caption generation

In this exercise, you will use the same flickr dataset as previously, which has 30,000 images and associated captions. Now you will generate a caption for the following image using a pipeline instead of needing to import the model and preprocessing class specific to the model.

Photo of a man standing on a ladder cleaning a window

The dataset (dataset) has been loaded with the following structure:

Dataset({
    features: ['image', 'caption', 'sentids', 'split', 'img_id', 'filename'],
    num_rows: 10
})

The pipeline module (pipeline) has been loaded.

This exercise is part of the course

Multi-Modal Models with Hugging Face

View Course

Exercise instructions

  • Load the image-to-text pipeline with Salesforce/blip-image-captioning-base pretrained model.
  • Use the pipeline to generate a caption for the image at index 3.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Load the image-to-text pipeline
pipe = ____

# Use the pipeline to generate a caption with the image of datapoint 3
pred = ____

print(pred)
Edit and Run Code