Get startedGet started for free

Working with Unstructured Data

1. Working with Unstructured Data

In this video we'll be looking at how Snowflake handles unstructured files.

2. Beyond Tables

Snowflake was originally built around structured tables, but teams increasingly store PDFs, images, audio, and other documents alongside that data. You cannot query a raw PDF like a row/column table, but Snowflake gives you stages, catalogs, and access patterns so files stay in platform. Later in your learning path you may combine these files with Cortex and LLM functions for search and enrichment; here we focus on storage and governance.

3. Stages: The Landing Zone for Files

We’ve already introduced the concept of stages. Internal stages hold files managed directed by Snowflake. External stages reference files sitting in places like S3, Azure Blob Storage, or Google Cloud. For unstructured data, the stage is where the files live. Unlike many structured pipelines, the stage is often the home for those objects, not just a temporary landing zone.

4. Directory Tables

A directory table is a metadata layer that Snowflake automatically generates on top of a stage. When you enable it, Snowflake starts cataloguing every file in that stage — its name, size, last-modified date, and a URL for accessing it.

5. Directory Tables Syntax

To set one up, you create a stage with the directory option enabled. The DIRECTORY equals ENABLE equals TRUE parameter is what tells Snowflake to start tracking files as soon as they land. You don't build the table yourself - Snowflake generates it automatically once that flag is set. When new files arrive in the stage, the directory table doesn't update on its own by default. You use ALTER STAGE with the REFRESH keyword to sync it.

6. Directory Tables

That tells Snowflake to scan the stage and update the catalogue with anything new. You can also configure automatic refresh, but the manual command is what you'll reach for during testing and ad hoc loads.

7. Querying a Directory Table

You can query a directory table using the DIRECTORY function with the stage name pre-fixed with the @ symbol. Here we’re pull the file name, size and last modified date from the Snowy Peak stage. The team can use this to audit what’s in the stage, spot stale files or feed any downstream processes that need to know which reports are available.

8. Pre-signed URLs

Directory tables list what is in a stage. When something outside Snowflake needs temporary access to a file, Snowflake can expose several URL patterns depending on the integration: stage URLs, scoped file URLs, pre-signed URLs, and related helpers. They differ in scope, lifetime, and who can mint them. `GET_PRESIGNED_URL` is the common pattern for a time-limited, shareable HTTPS link to one object. You pass the stage, file path, and lifetime in seconds. The link works until it expires, which is useful for partners or apps that do not have Snowflake logins. Use the right URL type for your security model. Refer to the Snowflake docs for the full matrix.

9. Let's practice!

Let’s apply what you’ve learned about working with unstructured data.

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.