Get startedGet started for free

External and Iceberg Tables

1. External and Iceberg Tables

In this video you'll learn how Snowflake can query data that lives outside its native storage - whether in cloud storage or open table formats - without ever loading it in. We'll look at external tables, Iceberg tables, and when to reach for each one.

2. What is an External Table?

Harbr receives supplier inventory files from dozens of partners every day. Some are huge; some are only queried occasionally. Loading all of them into Snowflake doesn't make sense. External tables and Iceberg tables are built for exactly this situation. The data stays in S3, GCS or Azure Blob. Snowflake stores the metadata and only reads files at query time.

3. Creating an External Table

Creating an external table takes three things: a LOCATION pointing to a path in a named stage, a FILE_FORMAT telling Snowflake how to parse the files, and AUTO_REFRESH set to true so Snowflake picks up new files automatically as they land. From that point you query it exactly like a native table - the difference is entirely about where the data lives.

4. When to Use External Tables

External tables make the most sense in a few specific situations. When a partner owns the files, you can query their S3 bucket directly without taking on data management. When files are large but accessed infrequently, loading them would cost more in storage than the query speed is worth. When compliance rules require data to stay in its original location, external tables let you query without moving anything. And when you're still deciding what to load, they let you explore raw files before committing. In each case, the SQL is identical to what you'd write against a native table.

5. External Tables vs Loading Data

The choice comes down to query frequency and latency requirements. External tables cost nothing in Snowflake storage and always reflect cloud storage, but queries are slower because files are read every time. Loading with COPY INTO gives full query performance at the cost of storage and managing load frequency. For Harbr's archived supplier files queried monthly, external tables are right. For live delivery events queried every minute, load them in.

6. Apache Iceberg Tables

Traditional data lakes stored files in S3, Azure Blob, or GCS with no reliable way to handle concurrent writes, schema changes, or historical queries. You could read the files, but you couldn't trust them the way you'd trust a database. Apache Iceberg fixes that — it's an open table format that adds a metadata layer on top of plain Parquet (or ORC/Avro) files, giving your data lake database-grade reliability without locking you into any single engine.

7. Snowflake-managed vs Externally-managed

Both types store data in your own cloud storage - the difference is who owns the metadata catalog. In the syntax you can see it in one line: CATALOG equals SNOWFLAKE for Snowflake-managed, or the name of an external catalog for externally-managed. Snowflake-managed gives you full read-write access from SQL. With an external catalog, Snowflake can read but cannot write - the external system owns the metadata. If Snowflake is your only engine, managed is simpler. If Spark or Flink also need to write to the same tables, externally-managed lets every engine share the same data without conflict.

8. Let's Practice!

You've covered two ways to work with data that lives outside Snowflake's native storage: external tables for querying files in place, and Iceberg tables for open-format multi-engine access. Let's put that to the test.

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.