Get startedGet started for free

Unity Catalog and Lineage

1. Unity Catalog and Lineage

Who can access what data, and where did it come from? In Databricks, governance lives in Unity Catalog.

2. Why governance matters

Picture an organization with hundreds of tables spread across teams. Someone asks: "Can the marketing team see customer financial data?" Or: "This report looks wrong - where does the underlying data actually come from?" Without governance, answering these questions means digging through code, asking around, and hoping someone remembers. Unity Catalog centralizes the answers.

3. The Unity Catalog hierarchy

Unity Catalog organizes everything into a three-level hierarchy. At the top is the metastore - one per Databricks account, it's the container for everything else. Inside a metastore, you create catalogs - think of these as top-level folders, often matching environments like "production" or "development." Inside each catalog, schemas group related objects together - like a "sales" schema containing all sales-related tables, views, and functions. This structure maps naturally to how organizations already think about their data.

4. Access control

Permissions in Unity Catalog work at every level. You can grant SELECT on an entire catalog, a specific schema, or an individual table. Permissions are inherited - granting access to a catalog gives access to all schemas and tables inside it, unless you explicitly revoke at a lower level. This inheritance makes it manageable to govern large environments. You set broad permissions at the catalog or schema level, then tighten things up where needed.

5. Data lineage

Lineage is Unity Catalog's answer to "where did this data come from?" It automatically tracks the relationships between tables - which tables feed into which, all the way from raw sources to final dashboards. When you view a table's lineage graph, you can trace upstream to see its sources and downstream to see what depends on it. This is captured automatically as queries run - you don't need to maintain documentation or build lineage manually.

6. Lineage in practice

Lineage has everyday practical uses. A gold table is showing unexpected numbers - trace upstream to find which silver table introduced the issue. You need to drop a column from a bronze table - check downstream lineage first to see if any silver or gold tables depend on it. An auditor asks where customer data flows in your system - lineage gives you the answer without guesswork. It turns what used to be a detective investigation into a quick UI lookup.

7. Summary

To recap: Unity Catalog is the single governance layer for all your Databricks data assets. It uses a three-level hierarchy - metastore, catalog, schema - to organize everything. Access control works through SQL grants that inherit down the hierarchy. And lineage is tracked automatically, giving you visibility into where data comes from and where it goes. In the next lesson, we'll use Unity Catalog to share data securely with people outside your organization.

8. Let's practice!

Time to explore. You'll navigate Unity Catalog, inspect a table's lineage, and trace its data flow.

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.