Getting answers from data: Using text-to-SQL

1. Getting answers from data: Using text-to-SQL

Hi again. Now let's talk about another tool that we have in our tool belt, Text-to-SQL. And let's go back to the story about the child and the library from earlier. This time, let's think about what a high school student would do when exploring a new library for their first time. Unlike before, when the child was running in with a question, grabbing different books, articles, and newspapers that they needed on their own, and running away, this time the high school student has access to the library's data management system and can query the library records. In this case, the student has to learn about how to use the library's software interface, and they must know all of the identifying and cataloging information, and how to query this information before they have any chance of finding the answer to their question. Luckily, the librarian is there to assist with understanding how the software works and how the data is organized. In this case, the librarian represents the data analyst that we ask our questions to, since we don't have the skills to dig through the data to find the answers ourselves. Now what happens in real life, for most of us, is that we ask the librarian our question. They already know how to translate our natural language query into the information that they would need to look up the answer to our question. And once they have the answer, they give us the resulting data back. This is how text-to-SQL works. But just like the student needs the librarian to understand the catalog and how to use the library's tools to find answers, LLMs need to interpret data schemas and need to be able to query information relationships with precision. If an LLM misunderstands the instructions that are given, or does not know how to look up the correct information to formulate the answer, then the response it gives will be incorrect. So let's move on from the library back into the business world. It goes without saying that accuracy in any response to a business intelligence query is vitally important. LLMs, even those trained for SQL generation, have difficulty understanding real-world databases and schemas, especially when these databases often have hundreds of tables and require complex joins to query correctly, at least when the LLM is operating alone. This is an important challenge to overcome. Otherwise, our text-to-SQL application will return inaccurate information and not be able to answer our queries. And this is a hard problem. BirdSQL, a popular benchmark for this problem, evaluates different text-to-SQL models with over 12,000 cross-domain query SQL pairs covering 95 large databases. It currently shows the leading text-to-SQL model lags behind human performance by more than 17 percentage points at about 75% accuracy. And for most business intelligence apps, 75% doesn't cut it. In this example, the importance of a query engine being able to accurately view and query the entire volume of data to answer questions is pretty evident. So what's the solution? One way is to provide the LLM more context about the contents of your tables and how they relate to the business questions that you want to answer. In some ways, this is analogous to the RAG setup we discussed in the last video, except you'll need to provide the context needed to generate accurate SQL. All too often, data scientists and AI engineers are criticized for building the best apps and models that lay idle in notebooks rather than living in the hands of users. And don't worry, we won't let your apps die in a notebook. Data is useful to no one if it's behind a barrier. In this course, you'll learn how to get your apps to users by building a conversational front end that you'll develop and deploy using Streamlit. Lots coming up. In this video, we looked at how to get answers from structured data using Text-to-SQL. Being able to generate accurate SQL queries from natural language will help us as we develop our user-facing apps and overcome the limits of leading Text-to-SQL models. This will be very useful when we build a conversational front end with Streamlit in the upcoming modules. Now hopefully you have a good picture of the building blocks in your head and are thinking about all of the ways that you'll be able to use them for your work. In the next video, we'll talk about who will benefit the most from this course and go into more specific detail on the specific apps that we'll build in the later modules. See you then.

2. Let's practice!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.