Understanding BM25
Before you start integrating a BM25 sparse retriever into your RAG architecture, it's best to test it on some short strings to get a intuition for how the retriever selects the documents.
You've been provided with three strings that you'll use as the basis for your BM25 retriever. The functionality required for this exercise is already loaded for you.
This exercise is part of the course
Retrieval Augmented Generation (RAG) with LangChain
Exercise instructions
- Initialize the BM25 retriever from the documents; configuring it to retrieve three documents at a time.
- Invoke the retriever on the query provided.
- Print the page content of the first result.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
chunks = [
"RAG stands for Retrieval Augmented Generation.",
"Graph Retrieval Augmented Generation uses graphs to store and utilize relationships between documents in the retrieval process.",
"There are different types of RAG architectures; for example, Graph RAG."
]
# Initialize the BM25 retriever
bm25_retriever = ____.from_texts(____)
# Invoke the retriever
results = bm25_retriever.____("Graph RAG")
# Extract the page content from the first result
print("Most Relevant Document:")
print(____)