Getting answers from data: Using RAG

1. Getting answers from data: Using RAG

Welcome back! In this video, we're going to look at how RAG helps us find answers from REAMS documents, books, transcripts, and other unstructured data. Imagine a curious child. This child knows how to speak and understands language very well. The child loves to talk, loves to have conversations, and loves to explain things to others. Well, at least when the child knows about the topic. When you ask the child about something that they don't know anything about, they say, I don't know, or might even make something up. This is kind of like an LLM. Now imagine if we could give this child a library card and a bus ticket to the nearest library. Imagine further that the child is assisted by a librarian that knows exactly which books, articles, or newspapers to grab, and which specific parts of those to read. Once the child is done, they run back from their library, combine their language ability with the new knowledge they learned, and answer your question correctly using sources from their reading. Smart kid. This is similar to how RAG works. The child has some pre-learned knowledge and has a strong language ability, just like a language model that has learned knowledge from pre-training. In our story, the child is great at communicating, but struggles with hard facts, especially those they have never been exposed to. The library that the child runs to is the set of documents where the child can look up specific facts or information they need to answer the question. And when we ask our RAG apps a question in natural language, the app retrieves the information it needs to answer the question, and it uses its model's language skills to answer questions in a way that we can understand. To stretch the metaphor a bit, this is efficient because the child does not have to carry all of the current knowledge of the world in their head to be ready to answer questions. Instead, they can just fetch what they need and answer using the information gathered. In an idealized world, we would be able to give all of the information to our LLMs at inference time so that they could answer all questions correctly, but there are important limitations that stand in our way. One problem is the limited context windows compared to the data that we want to query. To work around this, RAG pulls out only the most relevant information to deliver into the context window. Even if we had infinite context window size, another problem is that the LLM would still have trouble sifting through all of that volume of data. LLMs often struggle with recalling information from the middle of their context window. This problem is often called the lost-in-the-middle problem. Perhaps one day we'll be able to solve this and have LLMs that can generate answers losslessly. Even in that case, it'd be obscenely expensive to process all of our tokens each time we call the model. Until then, we have RAG. In this video, we introduced RAG, Retrieval Augmented Generation, as the way we ask questions of unstructured data. We covered this at a very high level, and we'll cover it in detail in an upcoming video. I want to paint a picture in your head about why we need RAG and some of the useful things it can do for us when we're building question-answering apps. So in the next video, we'll look at text-to-SQL and how it can be used to build the same sort of apps, but this time for data sitting in structured format. See you in a moment.

2. Let's practice!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.