Get startedGet started for free

What does it mean that Apache Iceberg is an Open Table Format?

1. What does it mean that Apache Iceberg is an Open Table Format?

In this lesson, we'll discuss Apache Iceberg as an open table format, which was created to allow engineers to work with files in the data lake as if they were a relational table in the traditional relational database. But what does that mean? Let's break it down into two questions. What is a table format and what does it mean to be an open one? To paraphrase Ryan Blue, the Apache Iceberg PMC chair, the definition of a table format is simply a description of how to do SQL on a bunch of files. This means a table format is an instruction manual for a query engine like Apache Spark or Trino that specifies how to interact with a set of data files, while getting the same benefits they would as if they were working with the transactional relational database. While the Iceberg project does provide concrete implementations in Java, Rust, Go, and Python, the rules for how an engine interacts with an Iceberg table are not written in a code-first manner, but rather they are defined and ratified in a technical specification referred to as the spec. This is similar to how the HTTP standard is defined by industry practitioners who have agreed upon how the system should act, but implementations of the standard vary depending on the software. The method through which changes to Apache Iceberg are ratified is where the open part really comes into play. When we say open in terms of Apache Iceberg, it means that Apache Iceberg has an open standard, an open codebase, and most importantly, open governance. Open standard and open codebase mean that anyone who wants to implement an Apache Iceberg client can, and anyone who wants to know how the official libraries are built can have complete transparency. The most important though is open governance, which makes the Iceberg project truly exceptional. One of the key goals of Apache Iceberg has always been interoperability and interoperability cannot be achieved if a single company controls the format. With this in mind, after its initial creation at Netflix, Iceberg was donated to the Apache Foundation. As an Apache project, Iceberg would never be controlled by a single company and could commit to users that it would stay vendor neutral and openly governed. Apache projects use open governance, a collaboration of individuals who, though they may work for direct rivals, find consensus to develop the project. This lets users and vendors trust that as the project evolves, it does so for the best of everyone and not just to the advantage of a particular vendor. Today, Apache Iceberg is supported by dozens of engines, tools, and frameworks, letting users blend together their preferred technologies to build their open Lakehouse. In the next lesson, we'll learn about the concept of an open Lakehouse, we'll touch on what it means for a Lakehouse to be open, and then discuss the core components that will comprise our Apache Iceberg open Lakehouse.

2. Let's practice!

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.