What is Cortex Search

1. What is Cortex Search

Good to see you again. In this video, we're going to dive into a more detailed discussion about what Cortex Search is. You'll learn how Cortex Search works and the elements that make it far superior to keyword search or semantic search being used on their own. We'll talk about who gets access to the search and how we set this up. And we'll discuss data freshness, getting data into the service, document parsing, text splitting, and cost considerations. We'll start with a discussion of what Cortex Search is. Cortex Search is a search service that uses state-of-the-art hybrid search, backed by powerful open-source embedding models that finds only the most relevant information you need to answer your queries. It's built to power a wide array of search experiences, including an enterprise search bar. But today, we'll focus on using it for rags. If we recall the structure of a rag app, this allows us to actually find, not just fruitlessly search for, the information the LLM needs to answer the query asked of it. Getting this right is the key to a high-quality rag that doesn't hallucinate, which means making up answers that are incorrect. We'll use Cortex Search for the retrieval stage of rag. As the retrieval stage, Cortex Search is optimized for low latency and works very well when paired with an LLM when we build a rag. And it's a managed service, which means that Cortex Search abstracts away a lot of the work that you'd have to do to set up and maintain retrieval. So by using Cortex Search, you don't have to worry about manually embedding text, keeping the services online and fast, maintaining infrastructure, or re-indexing the retriever when you get new data. All of that is abstracted away. So as we mentioned before in Module 1, Cortex Search takes a hybrid search approach. This is a fancy way of saying that it combines two search methods to get the best results. One method, keyword search, allows us to find exact mentions of words and phrases so we don't miss out on highly specific terminology. The other, semantic search, helps us discover when the document answers our question, but with different words than what was used in the original query. Helpfully, the semantic search used by Cortex Search is powered by a state-of-the-art open source embedding model trained by Snowflake called Arctic Embed. I'll drop a link in the reading about Arctic Embed so you can learn all about it. Cortex Search combines the user's results and reorders them based on relevance using another model called a re-ranker. The re-ranker is a relatively expensive operation, which is why we can't just run it on our entire knowledge base. But it's the key ingredient that makes hybrid search so effective. You might want to stop and think for a second about access controls when you're getting ready to shape your RAG app to real users. You probably want to make sure that only the users that have access to the underlying data can access answers from your app. This is one of the nice things about creating a Cortex Search service. In order for your users to query your search service, they need to both have access to the service and the underlying data. This is great for preventing security breaches or for creating self-contained internal apps. For example, if I have a RAG app that answers questions related to the company's internal finances, I probably want to make sure only employees in the finance department have access to querying the app. In more specific terms, to query Cortex Search, the user must be explicitly granted access to the Cortex Search service along with the underlying database and schema. You do this by granting user privileges running the grant usage command for the database, schema, and search service. In this module, you're working in your own trial account, and you have all of these privileges already as the developer. Permissions are not an issue right now. But once you start deploying your own apps to users, this will become much more important and easy to do using the grant usage command. The next important thing to think about when we're talking about the real implications of building a search service that will be used in production is how we make sure that we get the right answers based on new information. If a new information is added to a knowledge base weekly, daily, or hourly, we want to be able to ask questions of it and not query incomplete or old data. Cortex Search service has a parameter called target lag that allows us to control this freshness. Let's take a quick look at this. Notice that in this cell, we are creating the search service, and we have the parameter target lag assigned to one minute. This sets the Cortex Search service to check for updates to the base table that were created about once every minute. When the search service is created, it will create the index and automatically update it based on the target lag that we set. This means that we can guarantee the search results will be no older than one minute or however recently we set the target lag. These updates operate similarly to a dynamic table. If the underlying table is a candidate for incremental refresh, Cortex Search can check for changes and update only those changes in the index. Incremental updates to your data are critical for maintaining a fresh search and meeting the demands of your user's need for accuracy. What's correct today may not be correct tomorrow. Updating an index little by little sounds simple, but it can cause problems because of how these systems are designed. Things like graphs or trees are built to organize data in a way that makes searching really fast and accurate. But when you add new information piece by piece, it can mess up this organization, making searches slower and less reliable. So here's why this happens. These systems rely on balanced, optimized connections to work well. For example, when you're looking for similar items, the system checks the neighborhood of related data points in the vector space. If new data is added without reorganizing everything, these neighborhoods can get messy. Weak or random connections might form, and the system has a harder time grouping similar items or finding the best match. This makes the whole process less efficient and less accurate over time. So to combat these challenges, Snowflake can optimize the refresh by batching updates within your chosen target lag, and in some cases, complete a full refresh when needed to retain optimal query performance. Before we talk about searching unstructured data, we need to start with how to get the data in. If you completed our Intro to Generative AI with Snowflake course, you might remember that Snowflake offers a number of task-specific LM functions, which we used in that course to complete common tasks with GenAI. In this module, to help with preparing data for search, we'll introduce two new task-specific functions, ParseDocument and SplitText. ParseDocument combines Optical Character Recognition, or OCR, capabilities with machine learning models to identify text content and information stored in tables, and the structural elements of PDF documents. You can use the ParseDocument function to extract text and document layout to build information retrieval systems on large archives of business documents, and to load the extracted information into structured Snowflake tables for use by your applications. For text-heavy documents, we can stick with the OCR mode. When your documents have embedded elements like tables, or when you want to maintain the structural integrity of the documents during parsing, you would be better off using Layout mode. This can be ideal for preparing your data for RAG, as retaining the semantic structure of the original documents can lead to improved search performance. Once we've parsed the documents, we get out a lot of text, and depending on the document length, this could be unwieldy. We need to split this text up into smaller, bite-sized chunks to make them useful. The SplitText recursive character function splits a long string into shorter strings recursively for preprocessing text to be used with text embedding or search indexing functions. The function returns an array of text chunks, where the chunks are derived from original text based on the input parameters provided. The splitting text algorithm attempts to split text on separators in the order they are provided, either implicitly as defaults based on the format, or explicitly from the separator's argument. Splitting is then applied to each chunk that is longer than the specified chunk size recursively, until all chunks are no longer than the specified chunk size. For RAG, we'll usually also want to include substantial chunk overlap. This allows each chunk to retain context about the chunk immediately before and immediately after it. This way, when we retrieve a chunk, we don't risk missing out on the information we need. Chunk overlap helps us avoid FOMO. Another thing you need to consider is cost. Cortex search primarily incurs cost for serving and embedding. Serving refers to the use of multi-tenant serving compute. This is separate from a user-provided virtual warehouse. You pay based on the size of your indexed data. This includes both the size of your data plus their vector embeddings. And this is charged per gigabyte, per month, and metered by the second. Embedding costs additionally apply when documents are added to the search service to create their vector embeddings. Embedding of the search queries are processed incrementally, so embedding cost is only for added or changed queries. Interesting how this all works under the hood. In this video, we covered how Cortex search works directly with text, abstracting away the need for you to manually generate embeddings. This abstraction saves you tons of time and gets you to the app deployment step much faster. In the next video, you'll learn the specifics of loading unstructured data into a Snowflake stage to begin the creation of your own Cortex search service. The hands-on work starts now. See you in the next video.

2. Let's practice!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.