Pipeline caption generation
In this exercise, you will use the same flickr dataset as previously, which has 30,000 images and associated captions. Now you will generate a caption for the following image using a pipeline instead of needing to import the model and preprocessing class specific to the model.
The dataset (dataset
) has been loaded with the following structure:
Dataset({
features: ['image', 'caption', 'sentids', 'split', 'img_id', 'filename'],
num_rows: 10
})
The pipeline module (pipeline
) has been loaded.
This exercise is part of the course
Multi-Modal Models with Hugging Face
Exercise instructions
- Load the
image-to-text
pipeline withSalesforce/blip-image-captioning-base
pretrained model. - Use the pipeline to generate a caption for the image at index
3
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Load the image-to-text pipeline
pipe = ____
# Use the pipeline to generate a caption with the image of datapoint 3
pred = ____
print(pred)