Get startedGet started for free

Lakehouse Federation

1. Lakehouse Federation

What about querying data that lives somewhere else? Lakehouse Federation lets you query external sources directly from Databricks, without importing a single row.

2. The problem: data silos

Most organizations don't keep all their data in one place. You might have operational data in PostgreSQL, analytics in Snowflake, and your lakehouse in Databricks. Getting a complete picture usually means copying data between systems - which is slow, expensive, and creates yet another stale copy. What if you could just query the data where it already lives?

3. What is Lakehouse Federation?

Lakehouse Federation lets you set up connections to external databases and query them directly from Databricks. You define a connection - specifying the host, credentials, and database type - and Databricks registers it in Unity Catalog. From that point, you can run SQL queries against those external tables as if they were local lakehouse tables. The data never moves. Databricks pushes the query down to the external system, gets the results, and returns them to your notebook.

4. Setting up a connection

Here's what the setup looks like. You create a connection with the database type, host, and credentials. Then you create a foreign catalog that maps the external database into Unity Catalog's hierarchy. Once that's done, you query external tables using the same three-level naming convention - catalog, schema, table - that you use for everything else. The queries look identical whether the data is in your lakehouse or an external system.

5. When to federate vs. when to ingest

Federation isn't always the right answer. It works well when you need real-time access to data that changes often in the source system, the query volume is modest, or compliance rules prevent you from copying the data. But if you're running heavy analytical queries repeatedly against the same external data, federation will be slower than having a local copy. In those cases, ingest the data into your lakehouse - run it through the medallion layers - and query it locally. The decision comes down to freshness versus performance.

6. Federation in Unity Catalog

The best part of federation is that it integrates seamlessly with Unity Catalog. Federated tables appear in the same hierarchy. They follow the same access controls and show up in lineage graphs. You can join a federated PostgreSQL table with a local gold table in a single SQL query - Databricks handles the execution plan across both systems. Supported sources include PostgreSQL, MySQL, SQL Server, Snowflake, and the list keeps growing.

7. Summary

Here's the takeaway. Lakehouse Federation lets you query data in external systems without moving it. You set up a connection, register it in Unity Catalog, and query with standard SQL. Use it when you need real-time access, when data can't be moved, or when the query volume is manageable. For heavy, repeated workloads, ingesting into the lakehouse gives you better performance. That wraps up governance and sharing - next chapter, we'll package everything for production deployment.

8. Let's practice!

Let's try it.

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.