Prompting Vision Language Models (VLMs)
Over the next two exercises, you'll use a multi-modal model to analyze the sentiment of a news article and its corresponding headline image from the BBC News dataset on Hugging Face:

To start, you will prepare a chat template for the model that includes both the image and the news article. The dataset (dataset) and headline image (image) have been loaded.
This exercise is part of the course
Multi-Modal Models with Hugging Face
Exercise instructions
- Load the news article content (
content) from the datapoint at index6in thedataset. - Complete the text query to insert
contentintotext_queryusing f-strings. - Add the
imageandtext_queryto the chat template, specifying the content type oftext_queryas"text".
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Load the news article content from datapoint 6
content = ____
# Complete the text query
text_query = f"Does the news article have a positive, negative, or neutral impact on championship winning chances: {____}. Provide reasoning."
# Add the text query dictionary to the chat template
chat_template = [
{
"role": "user",
"content": [
{
"type": "image",
"image": ____,
},
____
],
}
]