RAG Refresher

1. RAG Refresher

Welcome back! Let's take your skills to the next level by building agents that work with large document collections.

2. Example: A Smart Cooking Assistant

Imagine you're building a smart cooking assistant for home chefs. To answer questions about recipes, techniques, and meal planning, the assistant needs access to culinary knowledge. But this information isn't stored in one place. It's scattered across recipe collections, technique guides, and meal planning resources. To be effective, the agent needs a way to search through this knowledge base and pull the most relevant details.

3. What is Retrieval Augmented Generation (RAG)?

This is where Retrieval Augmented Generation, or RAG, comes in. Think of RAG as a smart librarian: When you ask a question, it scans through all available documents, finds the most relevant sections and uses those sections to craft an answer. In short, RAG is a way to combine information retrieval with LLM generation.

4. The RAG Workflow

The RAG workflow has a few steps. First, convert your question into a search query, and find matching document chunks in a vector database. Then, select the top matching chunks - usually 3 to 5 pieces of text. After that, combine your question with these chunks and pass them to a language model that creates a response.

5. Loading and Splitting Documents

Let's walk through how to build this system. Say we have a folder of cooking documentation. First, we need to load our documents and break them into smaller chunks that the model can work with. To do that, we'll use some helpful utilities from the LangChain library. We start with PyPDFDirectoryLoader, which lets us load all the PDFs from a folder, in this case, a "cooking_docs" folder. We pass mode="single" so that each PDF is treated as a single document. Then, we use RecursiveCharacterTextSplitter to break the documents into chunks. A chunk_size of 1000 and chunk_overlap of 200 ensures each piece is readable and keeps important context from being lost.

6. Creating a Vector Store

Next, we convert these chunks into searchable vectors and build our retrieval system. We use an embedding model from HuggingFaceEndpointEmbeddings, which converts our documentation chunks and search queries into numeric vectors that capture semantic meaning. These vectors go into a FAISS vector store, which will let us search by similarity rather than exact words.

7. Querying the Vector Store

Then, we can query the vector store with a sample question. The .similarity_search method() retrieves the top three chunks most likely to contain the answer. The final line creates a context string, which joins all retrieved document chunks separated by blank lines for readability. The context string contains relevant information needed to answer the question. It will be passed to the language model along with the original query to create a complete response.

8. Example Query: How Do I Cook Salmon?

So, when someone asks about "cooking salmon with herbs" the system can also find relevant sections about "salmon preparation" or "baking salmon in the oven" even though the wording is different.

9. Traditional RAG Pipeline Limitations

But this traditional RAG pipeline has limitations. Consider this complex question: "How do I plan a week of meals under $50 while meeting all nutritional requirements?" The answer isn't in one place; it's spread across different parts of the documentation, and our current RAG setup performs one search and gets a few chunks. If that first search misses details, you get incomplete answers. In the next lessons, we will build code agents that overcome these challenges.

10. Let's practice!

For now, let's practice implementing a traditional RAG pipeline!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.