Get startedGet started for free

Vision Language models: multi-modal setup

Over the next two exercises, you'll use a multi-modal model to analyze the sentiment of a news article and its corresponding headline image from the BBC News dataset on Hugging Face:

BBC News dataset card

To start, you will prepare a chat template for the model that includes both the image and the news article. The dataset (dataset) and headline image (image) have been loaded.

This exercise is part of the course

Multi-Modal Models with Hugging Face

View Course

Exercise instructions

  • Load the news article content (content) from the datapoint at index 6 in the dataset.
  • Complete the text query to insert content into text_query using f-strings.
  • Add the image and text_query to the chat template, specifying the content type of text_query as "text".

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Load the news article content from datapoint 6
content = ____

# Complete the text query
text_query = f"Does the news article have a positive, negative, or neutral impact on championship winning chances: {____}. Provide reasoning."

# Add the text query dictionary to the chat template
chat_template = [
    {
        "role": "user",
        "content": [
            {
                "type": "image",
                "image": ____,
            },
            ____
        ],
    }
]
Edit and Run Code