Query Databricks Catalog with Databricks SQL Agent
1. Querying with the Databricks SQL agent
Now, let's create an AI SQL agent that can connect to Databricks data and answer SQL queries.2. Databricks Unity Catalog
We will query AI agents about data stored in Databricks unity catalog. In Unity Catalog, data is organized in "catalogs" that have "schemas", which can contain multiple tables. Tables in a schema can be created from a variety of data sources. Databricks exposes a "samples" catalog that contains multiple test schemas we will use.3. LangChain
To create an AI agent to query data in Databricks, we can use LangChain. LangChain is a library that allows us to build and deploy LLM applications.4. Python LangChain dependencies
To create a LangChain Databricks SQL agent, we first need to pip install the listed Python packages. Once they are installed, we need to import the following functions to create the Databricks SQL agent.5. Databricks SQL Warehouse ID
We need to create a SQL Warehouse to query data on Databricks. Next, we retrieve the warehouse_id from the HTTP path of the SQL warehouse we created.6. Authenticate LangChain to Databricks Workspace
When using LangChain to connect to Databricks outside of the Databricks environment, LangChain looks for the `'DATABRICKS_TOKEN'` and `'DATABRICKS_HOST'` environment variables so it knows which Databricks workspace to authenticate to.7. Databricks SQL agent
The LangChain Databricks SQL agent allows us to use an AI model to answer questions about data stored in a Databricks catalog. The AI agent can run SQL queries to arrive at the answer to our question. If we want the AI agent to show its work, set the `verbose` param to `True`. Here is a sample output from the AI agent to a question on a Databricks catalog.8. Use Databricks SQL agent to query Catalog
Here is an example of how to build a LangChain Databricks SQL agent to query the nyctaxi schema. We will dive into and explain each chunk of code.9. Create a LangChain SQL database from a Databricks Catalog
The first step to creating a LangChain Databricks SQL agent is to create a LangChain `SQLDatabase` object using the `from_databricks()` function. Here, we connect to the `"nyctaxi"` `schema` in the `"samples"` `catalog`. The `warehouse_id` is the identifier for the Databricks workspace to connect to.10. Create a LangChain LLM using a foundational model
To create a custom LangChain LLM that connects to a Databricks foundation model, we instantiate an object of the `ChatDatabricks` class. We can pass in the specific model in the `endpoint` field and adjust common model parameters such as `temperature` and `max_tokens`. `temperature` is a float between 0 and 1 used to specify the randomness in model responses and `max_tokens` is used to specify the maximum number of tokens or words the model should include in the response.11. Create a LangChain SQL database toolkit
The `SQLDatabaseToolkit` object specifies the data source and the AI model that the agent should use for queries. We can use the`db` and `llm` objects that we created earlier.12. Create a Databricks SQL Agent
We pass in the LangChain LLM and the toolkit we created to the `create_sql_agent()` function. By passing in `verbose=True`, we instruct the AI agent to show it's work by providing the SQL queries ran to arrive at its answer to our query. We can pass in a query we are interested in to `agent.run()` and the model will run SQL queries on the data and respond with its answer. Note that the first query will take longer than subsequent queries because it takes time for the initial SQL connection.13. Let's query Databricks data!
Congrats on learning how to create a Databricks SQL agent! Let's use it to ask queries to our data that require SQL!Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.