Document question and answering
Document question and answering is a multi-modal ML task which analyzes an image of a document, such as a contract, converts it to text, and allows a question to be asked about the text. This is useful when there are many scanned documents which need to be searched, for example financial records.
Build a pipeline for document question and answering, then ask the pre-loaded question Which meeting is this document about?
.
pipeline
from the transformers
library and the question
are already loaded for you. Note that we are using our own pipeline and dqa functions to enable you to learn how to use these functions without some of the extra setup. Please visit the Hugging Face documentation to dive deeper.
This is a part of the course
“Working with Hugging Face”
Exercise instructions
- Create a pipeline for
document-question-answering
and save asdqa
. - Save the path to the image,
document.png
, asimage
. - Get the answer for the
question
of theimage
using thedqa
pipeline and save asresults
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Create the pipeline
____ = ____(____="____", model="naver-clova-ix/donut-base-finetuned-docvqa")
# Set the image and question
____ = "____"
question = "Which meeting is this document about?"
# Get the answer
____ = ____(image=____, question=____)
print(results)
This exercise is part of the course
Working with Hugging Face
Navigate and use the extensive repository of models and datasets available on the Hugging Face Hub.
In this chapter, you'll apply pipeline methodologies to new tasks using image and audio data. Specifically, you will learn ways to process these types of data in preparation for tasks such as classification, question and answering and automatic speech recognition.
Exercise 1: Processing and classifying imagesExercise 2: Processing image dataExercise 3: Creating an image classifierExercise 4: What about the original image?Exercise 5: Question answering and multi-modal tasksExercise 6: Document question and answeringExercise 7: Visual question and answeringExercise 8: Audio classificationExercise 9: Resampling audio filesExercise 10: Filtering out audio filesExercise 11: Classifying audio filesExercise 12: Automatic speech recognitionExercise 13: Instantiating an ASR pipelineExercise 14: Word error rateExercise 15: Iterating over a datasetWhat is DataCamp?
Learn the data skills you need online at your own pace—from non-coding essentials to data science and machine learning.