Building a RAG with Cortex Search

1. Building a RAG with Cortex Search

Before we head back to our hands-on practice and complete our RAG with the search service, I want to recap the goal, what we're aiming to build. We start with the user interface. Once the user enters the query, this is sent to the RAG backend that we've built. The retriever identifies that we'll use both semantic and keyword search under the hood. Remember, this is what makes it a hybrid search. The semantic search is more of a fuzzy search, where the retriever identifies relevant chunks based on their meaning, and then the keyword search makes sure we don't miss chunks using key terminology. After the hybrid search returns the data, the retriever then passes the content and associated metadata to the re-ranker. The re-ranking further evaluates the returns and puts them in a top-end order, which is a subset of the top K that the retriever passed to it. The re-ranked returns can then be passed to the LLM for generation of a query response, that is then passed back to the user with or without associated feedback or references. Everything from the semantic and keyword search to the re-ranker is abstracted away for you by the Cortex search, and we've built this already. Once we get our search results, they're just passed to an LLM to complete the retrieval augmented generation and can be served up in a front-end like Streamlit. This is what we'll add next, the generation step. Now, let's go build it. Now that we have set up our Cortex search to be our retriever, we can add Cortex complete for generation to build our RAG. We'll also add TruLens instrumentation to our app. The first thing we want to do here is turn on open telemetry tracing. Open telemetry allows us to collect what's called spans, representing units of work in our system. In our case, the spans will include information about the retriever, the LLM call, and include key metadata like what data went in and came out, usage information like tokens and cost, and more. Then we'll create a new database and schema to use for storing our spans and evaluation metrics. Now we can go build our app. We'll do this by creating a Python class called RAG. The RAG will take our retriever, but this time we'll have three methods. Retrieve context, that'll use our retriever, generate completion, which we'll use an LLM to answer questions, and query, which is going to put everything together. In each of these methods, you'll notice has this instrument decorator above it. In the instrument decorator, we include information about the span type and attributes. The span type tells us semantic information about what this method really does. The span attributes tell us semantic information about the arguments that go into the method and what comes out. Including all of this metadata about the span allows for much richer spans. We can use that information to visualize spans in specific ways later. Also, it gives us a way for our evals to know what data they should run against. For example, context relevance will compute against the retrieval span attributes. The second method we'll use is generation. This is what I've been promising you'll get from this video. You imported complete from Snowflake Cortex. Now we'll call it and pass in the query and retrieve context into a template. You can modify this template as you'd like. The template I've given you does a few key things. First, it gives the LLM its role, an expert assistant. It tells it the task to answer questions based on context, and it gives specific instructions to not hallucinate and to say when it doesn't have the information it needs. You are an expert assistant extracting information from the context provided. Answer the question based on the context and do not hallucinate. If you don't have the information, just say so. We take this template along with the filled in arguments and send it to an LLM. Then the query method just runs each of the other methods in sequence. This is the entry point to the RAG and what the user will call. Now let's try it out. In the next cell, you'll run the query, how is inflation expected to evolve in 2024 to check our RAG. Here we run this and we get a pretty good answer about how inflation was expected to change in 2024. This looks good. Here you see how complete has used the content that the retriever pulled out of the PDFs to generate a response. Fantastic. Well done. Now we've built the first prototype of our RAG. In the next video, we'll learn how to measure if our RAG is working well.

2. Let's practice!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.