ComeçarComece de graça

Prompting Vision Language Models (VLMs)

Over the next two exercises, you'll use a multi-modal model to analyze the sentiment of a news article and its corresponding headline image from the BBC News dataset on Hugging Face:

BBC News dataset card

To start, you will prepare a chat template for the model that includes both the image and the news article. The dataset (dataset) and headline image (image) have been loaded.

Este exercício faz parte do curso

Multi-Modal Models with Hugging Face

Ver curso

Instruções do exercício

  • Load the news article content (content) from the datapoint at index 6 in the dataset.
  • Complete the text query to insert content into text_query using f-strings.
  • Add the image and text_query to the chat template, specifying the content type of text_query as "text".

Exercício interativo prático

Experimente este exercício completando este código de exemplo.

# Load the news article content from datapoint 6
content = ____

# Complete the text query
text_query = f"Does the news article have a positive, negative, or neutral impact on championship winning chances: {____}. Provide reasoning."

# Add the text query dictionary to the chat template
chat_template = [
    {
        "role": "user",
        "content": [
            {
                "type": "image",
                "image": ____,
            },
            ____
        ],
    }
]
Editar e executar o código