Data Unloading and Connectivity

1. Data Unloading and Connectivity

Sometimes Harbr needs to get data out of Snowflake to export processed results back to a partner's cloud storage. And sometimes data needs to flow in from systems that don't produce files at all: Kafka streams, Spark jobs, BI tools. This video covers both: data unloading and Snowflake's connectivity landscape.

2. The Unloading Use Case

Not all of Harbr's logistics partners have Snowflake accounts. At the end of each day the team generates shipment summaries and needs to make them available in S3 for partners to review. Snowflake's COPY INTO command handles this natively — it writes query results directly to cloud storage with no separate export pipeline. The syntax mirrors the loading version of COPY INTO, but the target is a stage location rather than a table. The FROM clause accepts a subquery, so you can filter, join, and transform before exporting.

3. COPY INTO

Here Harbr exports today's shipment records to a partner S3 stage. Make sure to set HEADER equals TRUE to add column names to the output which is important when the partner is loading this into their own system. By adding OVERWRITE equals TRUE it replaces any existing files at that path which is useful for daily exports where you want a clean replace rather than appended files.

4. Export File Format Options

Snowflake supports three main unloading formats. CSV is the universal choice - almost every system can read it. JSON preserves nested structures and is useful if the consuming system expects semi-structured output. Parquet is the most efficient for large analytical datasets: columnar compression means smaller files and faster reads for downstream processing. For Harbr, CSV works for partner exports; Parquet makes more sense for internal exports feeding Spark jobs. Three options control how the output is written. HEADER equals TRUE adds column names as the first row of the file - essential when a partner is loading the export into their own system. OVERWRITE equals TRUE replaces any existing files at that path, which is the right default for daily exports where you want a clean replace. And MAX_FILE_SIZE lets you cap each output file's size in bytes - Snowflake splits the export across multiple files automatically, which improves parallel downstream reads.

5. The Connectivity Landscape

The unloading concepts cover file-based data movement, but what about systems that don't work through files? This is where Snowflake connectors come in. Harbr's infrastructure includes a Kafka event bus, Spark-based transformation jobs, Python data pipelines, and BI tools querying Snowflake directly. Snowflake supports all of these through purpose-built connectors and drivers, each serving a different integration pattern.

6. Kafka and Spark connectors

The Kafka Connector streams records from Kafka topics directly into Snowflake tables — no files, no stages. Under the hood it uses Snowpipe Streaming, which is why latency is measured in seconds. The Spark Connector integrates with Spark's DataFrame API, letting Spark jobs read from and write to Snowflake as part of large-scale transformation workloads.

7. JDBC/ODBC and Partner Integrations

JDBC and ODBC are the universal connection layer for any SQL-compatible tool - that includes BI tools like Tableau, Power BI, and Looker, which query Snowflake directly without any data movement. The Python Connector is Snowflake's native driver for Python. Harbr's data engineering team uses it to run scheduled ETL jobs, orchestrate pipeline steps, and feed data science workflows - anywhere Python code needs to interact with Snowflake directly. dbt runs SQL transformation models directly in Snowflake's compute. Fivetran and Airbyte are managed ingestion platforms that connect source systems to Snowflake without requiring custom connector code. At Harbr, dbt handles transformations and Fivetran handles ingestion from SaaS source systems.

8. Let's practice!

You've covered how COPY INTO unloads query results to cloud storage, the export format options, and how Snowflake connects to the broader data ecosystem through Kafka, Spark, JDBC, and partner integrations. Let's practice!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.