Get Started

Document question and answering

Document question and answering is a multi-modal ML task which analyzes an image of a document, such as a contract, converts it to text, and allows a question to be asked about the text. This is useful when there are many scanned documents which need to be searched, for example financial records.

Build a pipeline for document question and answering, then ask the pre-loaded question Which meeting is this document about?.

pipeline from the transformers library and the question are already loaded for you. Note that we are using our own pipeline and dqa functions to enable you to learn how to use these functions without some of the extra setup. Please visit the Hugging Face documentation to dive deeper.

This is a part of the course

“Working with Hugging Face”

View Course

Exercise instructions

  • Create a pipeline for document-question-answering and save as dqa.
  • Save the path to the image, document.png, as image.
  • Get the answer for the question of the image using the dqa pipeline and save as results.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Create the pipeline
____ = ____(____="____", model="naver-clova-ix/donut-base-finetuned-docvqa")

# Set the image and question
____ = "____"
question = "Which meeting is this document about?"

# Get the answer
____ = ____(image=____, question=____)

print(results)
Edit and Run Code

This exercise is part of the course

Working with Hugging Face

IntermediateSkill Level
4.6+
7 reviews

Navigate and use the extensive repository of models and datasets available on the Hugging Face Hub.

In this chapter, you'll apply pipeline methodologies to new tasks using image and audio data. Specifically, you will learn ways to process these types of data in preparation for tasks such as classification, question and answering and automatic speech recognition.

Exercise 1: Processing and classifying imagesExercise 2: Processing image dataExercise 3: Creating an image classifierExercise 4: What about the original image?Exercise 5: Question answering and multi-modal tasksExercise 6: Document question and answering
Exercise 7: Visual question and answeringExercise 8: Audio classificationExercise 9: Resampling audio filesExercise 10: Filtering out audio filesExercise 11: Classifying audio filesExercise 12: Automatic speech recognitionExercise 13: Instantiating an ASR pipelineExercise 14: Word error rateExercise 15: Iterating over a dataset

What is DataCamp?

Learn the data skills you need online at your own pace—from non-coding essentials to data science and machine learning.

Start Learning for Free