Opening up unstructured data with RAG
1. Opening up unstructured data with RAG
Welcome back. I'm glad to see you're following along with me. Most of the world's data resides in unstructured formats like chat logs, books, social media posts, emails, and PDFs. So imagine a treasure chest where 80% of the wealth remains untouched. That's what's happening with unstructured data today. It's not just lost value, it's lost opportunities to predict trends, understand our customers, and gain a competitive advantage. In this module, we'll focus on directly asking questions of this unstructured data. To do this, we'll build a RAG using Cortex Search as our backend for retrieval and a Cortex LLM for generation. We'll walk through how to create a Cortex Search service and learn how it works under the hood. And we'll do this with a live example, where you'll load unstructured data in the form of PDFs to a Snowflake stage, and then parse and split the documents into chunks that the search service can consume. Of course, we'll create a Cortex Search service that sits on the data we've prepared. So we'll then complete building a RAG by tying together Cortex Search with the Cortex LLM. And importantly, we'll examine how we can measure the accuracy of our RAG app with evals. Oh, and did I mention we'll do all of that in the Snowflake Notebook? Excited? I am. Imagine that I'm a volunteer for our local city council. Every month, the council has a meeting to discuss the latest happenings in the town. Members from the community come to raise their issues about safety, maintenance, and more, and we work with the community to come up with the solutions. The problem is that not everyone is able to attend these meetings, and frankly, reading meeting minutes? Really boring. It'd be great if there was a way for everyone in the community to get their questions answered about what went on at the meeting. But it can be really time-consuming for the council to spread the word and answer the questions from all of our neighbors about decisions made by the council. And it's critical that the community always gets the latest, accurate information. We often have rumors swirling about the town simply because people are not up-to-date on the latest meetings. If we could stand up a fast and accurate conversational assistant for our community, that could be a game-changer for us. It could dramatically improve our ability to involve the community, allow our council members to spend more time crafting innovative solutions to community issues, and make sure everyone always gets fresh, accurate answers to their questions. This is a great candidate for using Retrieval Augmented Generation, or RAG, to power a community support application. So now that we've talked about the why, let's talk about the what. Using a Snowflake Notebook, we'll load unstructured data to a Snowflake Stage, and start getting our hands dirty setting up a search service that will index the Federal Open Market Committee FOMC meeting minutes. FOMC minutes are detailed records of the discussions and actions taken by the Federal Reserve regarding monetary policy. These are released three weeks after each meeting, and they provide insights into the Fed's outlook on the economy, potential future policy actions that they may take or may be considering. So together, we'll parse the FOMC meeting minutes to get text out of the raw PDFs, and we'll split and chunk the text using a recursive character splitter. You'll get hands-on experience doing this, no sitting around. You'll be following along in your own Snowflake Notebook with me, not just watching me do it. You'll create and query a Cortex search service, and this will be the retriever for your RAG. We'll also examine how to measure the success of a retrieval and generation. For this example, we won't have Ground Truth to aid us, so we'll just take advantage of LLM as Judge to grade the success of our RAG, and we won't just stop at learning about the metrics. We'll actually apply these metrics to determine quality for our RAG app. And after that, we'll look at how the search service stays fresh as the underlying data changes. Ready to get started? Let's go!2. Let's practice!
Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.