Loading PDF files for RAG
To begin implementing Retrieval Augmented Generation (RAG), you'll first need to load the documents that the model will access. These documents can come from a variety of sources, and LangChain supports document loaders for many of them.
In this exercise, you'll use a document loader to load a PDF document containing the paper, Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks by Lewis et al. (2021). This file is available for you as 'rag_paper.pdf'
.
Note: pypdf
, a dependency for loading PDF documents in LangChain, has already been installed for you.
This exercise is part of the course
Retrieval Augmented Generation (RAG) with LangChain
Exercise instructions
- Import the appropriate class for loading PDF documents in LangChain.
- Create a document loader for the
'rag_paper.pdf'
document. - Load the document into memory to view the contents of the first document, or page.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Import library
from langchain_community.document_loaders import ____
# Create a document loader for rag_paper.pdf
loader = ____
# Load the document
data = ____
print(data[0])