Get startedGet started for free

Tasks and DAG Orchestration

1. Tasks and DAG Orchestration

Harbr's data team is running a SQL script manually every morning to update the ops dashboard. One missed run and the business is flying blind. In this video you'll see how Snowflake tasks eliminate that manual dependency — scheduling SQL on a CRON, chaining steps into DAGs, and combining with streams for fully automated change data capture.

2. Solution For Manual Workloads: Tasks

Harbr's data engineering team starts every morning the same way - someone opens a laptop, finds the SQL script, and manually refreshes the ops dashboard. It works, until it doesn't. One missed run and the business is flying blind. One duplicate run and the dashboard shows inflated numbers. The script itself isn't the problem, the manual dependency is. This is exactly what Snowflake tasks are built for. A task wraps a SQL statement and runs it on a schedule, automatically. No human in the loop, no risk of forgetting, no double runs. In the next slides, we'll build a task pipeline that keeps Harbr's ops dashboard fresh — using multiple tasks to update the fact table in the right order, every time.

3. What Do We Need To Know About Tasks

A task runs SQL on a schedule - no external CRON jobs or orchestration tools required. It can execute a single SQL statement or call a stored procedure for more complex logic. Everything lives natively inside Snowflake. Here, Harbr creates a task called refresh dashboard. It runs at six AM UTC every day using a CRON expression, and calls the refresh ops dashboard stored procedure — the same script the team was running manually every morning.

4. Creating a Standalone Task

To create a task, call CREATE OR REPLACE TASK and set a name. WAREHOUSE specifies the compute to use. SCHEDULE with a CRON expression defines when it runs - here, every hour at five past. Finally, the AS block determines what is actually executed, which is an INSERT INTO statement that pulls unprocessed events into a summary table.

5. Warehouse-based vs Serverless

You have two compute options. Warehouse-based uses an existing virtual warehouse at warehouse rates, with a 60-second minimum per run — a 10-second task is still billed as 60. Serverless omits the WAREHOUSE clause and Snowflake provisions exactly what the task needs, billed per second. Serverless is the right default for isolated tasks; a named warehouse suits larger shared workloads.

6. DAG-based Task Orchestration

A standalone task works for single steps, but Harbr's pipeline has three: ingest raw events, clean them, then build a summary. Each should only run once the previous has finished. This is a DAG - a directed acyclic graph. The root task holds the CRON schedule. When it completes, it fires the next task, and so on. Everything is centralised - no updating multiple tasks individually when the schedule changes. Simply said, DAGs explicitly control when and in what order things run. They are central to streams and tasks.

7. Managing Task States

Tasks move through states: SUSPENDED on creation, STARTED when activated, then either SUCCEEDED or FAILED. Tasks start suspended intentionally, so you can set them up and review them before they begin running on schedule. To activate a task, run ALTER TASK RESUME. To pause it, ALTER TASK SUSPEND. If you need to trigger a task immediately without waiting for the schedule, you can use EXECUTE TASK to run it manually. When working with a DAG of dependent tasks, activating each task individually gets unwieldy. Instead, use the system function SYSTEM$TASK_DEPENDENTS_ENABLE, passing in the root task name - this resumes all tasks in the DAG in one call.

8. Tasks and Streams Combined

Here's how we pull tasks and streams together - let's use an example. We have a stream that is on our delivery_events table and tracks every new insert. When the task fires, it checks whether the stream has data first using SYSTEM STREAM HAS DATA. If the stream is empty, it skips the task entirely and waits for the next scheduled run. When new rows are present, the WHEN condition evaluates to true and the task fires. It reads only the new change records from the stream and ingests them into logistics.processed_events. That's a continuous change data capture pipeline - new events flow through automatically, and compute is only consumed when there's actually something to process.

9. Let's practice!

You've learned how to build automated pipelines with standalone tasks, how DAG chains coordinate multi-step workflows, how to manage task states, and how streams and tasks combine for efficient CDC. Let's test your knowledge!

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.