Stages, File Formats, and COPY INTO

1. Stages, File Formats, and COPY INTO

Hello, my name is INSTRUCTOR NAME and I'd like to welcome you to this course on Data Pipeline Automation in Snowflake.

2. Snowflake powers many workloads

We'll cover everything from getting data into Snowflake to orchestrating pipelines, transforming and querying data, and optimizing for query performance. Snowflake is a complex platform powering many workloads.

3. Example pipeline

Throughout this course we'll discover how pipelines work exactly in the Snowflake platform. In this first video we're looking at how data lands in Snowflake - specifically stages, file formats, and COPY INTO.

4. Meet Harbr

Throughout this course we'll be working with Harbr - a global logistics and supply chain platform. Their data team uses Snowflake to track shipments, manage warehouse inventory, monitor delivery performance, and ingest data from supplier systems across multiple regions. It's a great example because it covers the kind of real-world variety you'll encounter in practice.

5. The Data Loading Problem

Harbr receives data from dozens of sources - CSV files from suppliers, JSON event feeds from logistics partners, and exports from internal ERP systems. Each has a different structure, format, and delivery mechanism. Before any of it can be queried in Snowflake, it needs to land somewhere Snowflake can read. That's where stages and file formats come in. A stage is a repository where files are stored as an intermediary between your data source and a Snowflake table.

6. Stage Types

There are two types of stages: internal and external. Internal stages are managed by Snowflake. External stages point to a location on a external provider such as Amazon S3, Azure Blob Storage, or Google Cloud Storage - and let you load data from there into Snowflake tables.Using stages simplifies and streamlines bulk loading data into and unloading data out of tables. There are three types of internal stages: user, table and named. A user stage is a personal staging area for a single user. A table stage is tied to a specific table therefore multiple users can stage files there, but they can only load into that one table. These two types of stages are created automatically by Snowflake. A named stage is a database object created in a schema, making it shareable across roles and sessions Similarly, an external stage can have a named stage that's location is on a cloud provider and acts as a pointer to it's external cloud location.

7. Stage Parameters

When you create a stage, you have a couple of additional options worth knowing about. A stage can include a directory table which is a catalog of the files sitting in that stage. It gives you a queryable view of what's staged without touching the data itself. You can also configure encryption at the stage level. For internal stages, Snowflake gives you two options: full encryption or server-side encryption. For external stages, encryption is cloud-specific and only required if your files or storage location are encrypted.

8. File Formats

File formats tell Snowflake how to parse files in the stage. Harbr works with structured CSVs and semi-structured JSON payloads. For semi-structured formats, Snowflake stores the document in a VARIANT column - no upfront schema needed, structure is resolved at query time. Rather than specifying options inline on every COPY INTO, you create a named file format once and reference it by name. If Harbr's CSV structure changes, updating the format object updates every pipeline that references it.

9. COPY INTO

COPY INTO is the command that moves data from a stage into a Snowflake table. The target is the fully qualified table name, so in this case Harbr's shipments table. FROM specifies the stage and path. FILE_FORMAT references the named format object. And ON_ERROR controls what happens if Snowflake encounters a bad row. This single command replaces what would otherwise be a multi-step ETL process. The data is already in the stage. COPY INTO reads it, parses it, and loads it.

10. Error handling

ON_ERROR controls what happens when Snowflake hits a bad row. ABORT_STATEMENT is the default — one bad row fails the whole load. CONTINUE skips bad rows and loads everything else, useful when a partial load beats no load. SKIP_FILE skips the entire file if it contains errors. SKIP_FILE_N and SKIP_FILE_N_PERCENT let you set count or percentage thresholds before a file is skipped.

11. Validation mode and COPY_HISTORY

Before loading, run the same COPY INTO command with VALIDATION_MODE equals RETURN_ERRORS. Snowflake parses the file and surfaces any rows that would fail — without touching the target table. It's a dry run: same stage, same path, same format, just no data written. Once a load has run, COPY_HISTORY gives you the full record. Pass in the table name and a start time, and Snowflake returns every load operation in that window: rows loaded, rows skipped, and any errors.

12. Let's practice!

You've covered stage types, file formats, COPY INTO syntax, error handling options, and how to validate and inspect loads. Time to practice!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.