Delta Sharing: Types and Trade-offs

1. Delta Sharing: types and trade-offs

Which type of Delta Sharing should you use? The right choice depends on who your recipient is and where their platform lives.

2. Two approaches to sharing

Databricks-native sharing is for when both the provider and the recipient are on Databricks. It's seamless - the recipient sees shared tables directly in their Unity Catalog, with full governance and audit logging on both sides. Open protocol sharing uses the open-source Delta Sharing specification, which means the recipient can be on any platform - Snowflake, Apache Spark, pandas, Power BI, you name it. It's more flexible but requires more setup on the recipient's side.

3. Databricks-native sharing

When both organizations use Databricks, native sharing is the obvious choice. The shared tables show up in the recipient's Unity Catalog as if they were local data. Access control, audit logging, and lineage all work automatically. The recipient doesn't need to install any client libraries or manage credentials - it just works. Think of it as the express lane at the airport. If you both have the right pass, you walk right through.

4. Open protocol sharing

Open protocol sharing is for when the recipient isn't on Databricks. The provider creates the share the same way, but the recipient uses a Delta Sharing client - as in the code - to list and load tables from a profile. The recipient can be on any platform. Spark, pandas, Power BI, Snowflake, and Tableau are all fair game. They install the Delta Sharing client or connector on their side. The provider sends an activation link with credentials. It takes a bit more setup on the recipient's end, but it means you're not limited to sharing only with Databricks users. The open protocol is exactly that - open. Any tool that implements the spec can read your shared data.

5. Cost considerations

There's one more factor to consider - cost. When the provider and recipient are in the same cloud region, data transfer is usually free or very cheap. Cross-region sharing incurs egress charges from your cloud provider. And cross-cloud sharing - say, you're on Azure and the recipient is on AWS - has the highest fees. For occasional small queries, this is negligible. For large-scale ongoing sharing, it adds up. The practical advice: keep your shared data in the same region as your most important recipients, or replicate it to their region if the cost of egress outweighs the cost of storage.

6. Choosing the right approach

The decision tree is simple. If both organizations are on Databricks, go native - it's easier and gives you governance on both sides. If the recipient is on another platform, open protocol is your only option and it works well. In either case, factor in where the data physically lives relative to the recipient to avoid surprise egress charges.

7. Summary

To sum up: Databricks-native sharing gives you the smoothest experience when both parties are on Databricks. Open protocol sharing extends your reach to any platform. And no matter which approach you choose, pay attention to where the data lives - egress charges are the hidden cost of cross-boundary sharing. So choose based on the recipient's platform and data location. Next, we'll look at the flip side: instead of sharing your data out, what if you need to query someone else's data without bringing it in?

8. Let's practice!

Let's classify. You'll sort characteristics into native and open protocol sharing and work through a cost scenario.

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.