Get Started

Document question and answering

Document question and answering is a multi-modal ML task which analyzes an image of a document, such as a contract, converts it to text, and allows a question to be asked about the text. This is useful when there are many scanned documents which need to be searched, for example financial records.

Build a pipeline for document question and answering, then ask the pre-loaded question Which meeting is this document about?.

pipeline from the transformers library and the question are already loaded for you. Note that we are using our own pipeline and dqa functions to enable you to learn how to use these functions without some of the extra setup. Please visit the Hugging Face documentation to dive deeper.

This is a part of the course

“Working with Hugging Face”

View Course

Exercise instructions

  • Create a pipeline for document-question-answering and save as dqa.
  • Save the path to the image, document.png, as image.
  • Get the answer for the question of the image using the dqa pipeline and save as results.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Create the pipeline
____ = ____(____="____", model="naver-clova-ix/donut-base-finetuned-docvqa")

# Set the image and question
____ = "____"
question = "Which meeting is this document about?"

# Get the answer
____ = ____(image=____, question=____)

print(results)
Edit and Run Code