Vision Language models: multi-modal setup
Over the next two exercises, you'll use a multi-modal model to analyze the sentiment of a news article and its corresponding headline image from the BBC News dataset on Hugging Face:
To start, you will prepare a chat template for the model that includes both the image and the news article. The dataset (dataset
) and headline image (image
) have been loaded.
This exercise is part of the course
Multi-Modal Models with Hugging Face
Exercise instructions
- Load the news article content (
content
) from the datapoint at index6
in thedataset
. - Complete the text query to insert
content
intotext_query
using f-strings. - Add the
image
andtext_query
to the chat template, specifying the content type oftext_query
as"text"
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Load the news article content from datapoint 6
content = ____
# Complete the text query
text_query = f"Does the news article have a positive, negative, or neutral impact on championship winning chances: {____}. Provide reasoning."
# Add the text query dictionary to the chat template
chat_template = [
{
"role": "user",
"content": [
{
"type": "image",
"image": ____,
},
____
],
}
]