Course Introduction

1. Course Introduction

Data lakes were supposed to solve everything. Cheap storage, using any format with unlimited scale. But if you've worked with one, you know that reality can be far from that. Broken queries from concurrent writes, no way to update or delete records without rewriting entire partitions, schema changes that require days of downtime, and don't even ask about schema enforcement. What if there was a better way? What if you could have the flexibility of a data lake with the ACID reliability of a database? That's exactly what Apache Iceberg delivers. And in this course, you're going to learn how to use it. Hello, I'm Russell Spitzer, principal engineer at Snowflake and a member of the Apache Iceberg Project Management Committee. In this course, we'll explain what open source Apache Iceberg is, why you should consider Iceberg tables for your data lake, and show real-world examples of Iceberg's many capabilities along the way. You'll learn more about what makes Apache Iceberg so powerful, like its handling of concurrent modifications and ACID transactions, and how its interoperability lets us handle massive amounts of data across a variety of tools. This course will be good for you if you have some coding or data engineering background, are familiar with basic data table concepts, and are eager to learn more about open source technologies. In this opening module, we'll start with the basics. We'll go over what exactly Apache Iceberg is, and then dive into hands-on work with basic reading and writing to and from Iceberg tables. The second module will take us to the next level, and I'll show you how to productionize your Apache Iceberg tables and move external data into Apache Iceberg. The final module is focused on keeping your Apache Iceberg lakehouse running smoothly. We'll explore maintenance procedures, optimizations, and what anti-patterns should be avoided to achieve the best performance. As we go through the course, you can follow along with the exercises we've included, which take advantage of a Docker environment we've set up just for this course. It will allow you to simulate an entire lakehouse environment on your own machine using real production-grade components, albeit in test configurations, giving you a real experience of working with Apache Iceberg. By the time you are finished, you'll be ready to start designing data projects with Iceberg as a key component. And who knows? You may even be inspired to join the Apache Iceberg community and submit your own code to the project.

2. Let's practice!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.