PDF document loaders
To begin implementing Retrieval Augmented Generation (RAG), you'll first need to load the documents that the model will access. These documents can come from a variety of sources, and LangChain supports document loaders for many of them.
In this exercise, you'll use a document loader to load a PDF document containing the paper, RAG VS Fine-Tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture by Balaguer et al. (2024).
Note: pypdf
, a dependency for loading PDF documents in LangChain, has already been installed for you.
This exercise is part of the course
Developing LLM Applications with LangChain
Exercise instructions
- Import the appropriate class for loading PDF documents in LangChain.
- Create a document loader for the
'rag_vs_fine_tuning.pdf'
document, which is available in the current directory. - Load the document into memory to view the contents of the first document, or page.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Import library
from langchain_community.document_loaders import ____
# Create a document loader for rag_vs_fine_tuning.pdf
loader = ____
# Load the document
data = ____
print(data[0])