Automation with tasks

1. Automation with tasks

Tasks are the primary object in Snowflake that you'll use for automation. A task is essentially a scheduled operation. Tasks can be used to execute SQL statements, stored procedures, queries, and more. There are two types of tasks in Snowflake, user-managed tasks and serverless tasks. In this video, I'll cover the former, user-managed tasks. But because it's so easy to go from one type of task to the other, you'll also get some exposure to serverless tasks. The key difference between these two types of tasks is the management of the compute resources needed to run the task. With a user-managed task, a user in Snowflake assigns a compute resource to a task and continuously manages it. With a serverless task, Snowflake does the heavy lifting. Snowflake will right-size the compute resources for you so that you don't have to worry about managing a virtual warehouse. And all it takes to go from one type of task to the other is a one-line code change. I'll show this shortly. You'll specify the schedule on which a task should run using a cron expression. Cron expressions give lots of fine-grained control over how often the task should run. Finally, you'd pass in a thing that you want to automate. One of the most commonly automated things is a call to a stored procedure that might contain some critical logic. You could also automate things like SQL queries, email notifications, and more. You can also combine tasks with streams for workflows that allow you to continuously process updates to a table, for example. In fact, that's exactly what we'll do in this video. In any case, that's what's great about tasks. They're really flexible in what they can automate. Generally, it's very common to use tasks to automate the ingestion and transformation of data. This is a pattern that you'll come across very often. You can imagine, for example, using a task to run a `COPY INTO` command on a given schedule and using a task to call a stored procedure that performs transformations against that data after it's been ingested. And that's just a quick example. There's a lot more that you can automate. Let's go ahead and create a task together and use it to automate a stored procedure. Now's a good time to pause the video in case you need to log into your Snowflake account. Earlier, we created a stored procedure that processes a stream on the order header table. The logic in the procedure calculates and writes the daily sales in Hamburg, Germany to a table. If we wanted to keep that data fresh, we'd need to manually call the stored procedure over and over again. We can instead maintain our data freshness by automating a call to the stored procedure using a task. Yes, we did this already with a dynamic table. But again, you should know how to implement both types of approaches. Knowing both approaches will help you understand which approach is best in your own pipelines. Okay. Start by navigating to the orders headers task SQL file in the module 5 folder. Copy the contents of this file and paste them into a new SQL worksheet. Let's take a look at what we have. We have a task called process orders header Sprock. Sprock is shorthand for stored procedure, by the way. I specify that the task should use a virtual warehouse compute warehouse when executing this operation. This line of code automatically makes this task a user-managed task. If I specify a warehouse during task creation, Snowflake will recognize it as a user-managed task. If I omit this parameter, Snowflake will recognize it as a serverless task and know to do the right sizing for us. Next, I specify how often it should run by passing in a cron expression to the schedule parameter. This is the cron expression that signifies every minute. I encourage you to read up on cron expressions to learn more about how they're written. For now, you can trust this is the correct cron expression for executing this task every minute. Finally, I specify the operation that I want the task to execute. In this case, it's a call to the stored procedure process orders header Sprock. Let's execute the SQL that sets our context and creates a task. If you refresh on the left-hand side, you'll see the task was created. If you refresh on the right-hand side, you'll see the task was created successfully in the raw POS schema. You'll have to explicitly start the task for it to kick off. I do this on the next line by using alter task, name of our task, and then resume. Now, click on the task within the object picker and open it in a new tab. Next, click on run history. The task has been kicked off and scheduled to run. Okay, we've now automated the stored procedure to write to the daily sales Hamburg T table every minute. Let's navigate back to the worksheet and query the table to see what's in it. Okay, great. Let's insert some dummy data for a sale in Hamburg into order header. Remember, we have a stream on this table, so an insert operation on this table will be captured by the stream and processed in our stored procedure, which is being called every minute by the task that we just created. Okay, let's query our daily sales Hamburg T table once more. I'll wait a minute before doing this so that the procedure can be executed by the task. And there it is. That's the correct information based on the dummy data that we manually inserted into order header. Cool. Okay, we probably don't want to burn credits by having this task run every minute. I'd be okay with it running every 24 hours. Let's stop the current task. The code in comments lets you recreate the task and specify that it should run every 24 hours. I'm not going to run this code for now since I'm going to use the current definition of the task in the next exercise. But I did want to provide the code here just for completion. Oh, and by the way, we've done all of this in SQL, but you can also do this in Python using Snowflake's Python APIs. Okay, let's quickly step back and contextualize tasks. Up until now, just about everything we've done with the data in our pipeline has been pretty manual, like the ingestion and transformation of data. But tasks completely change that. With tasks, you level up your data pipeline game by taking manual processes to true continuous workflows. And with serverless tasks, management is even easier because you don't have to worry about managing compute resources for your task. Let's recap what you learned. You learned what a task is. You learned about the two types of tasks, user-managed and serverless, and the main difference between them. You learned that tasks add automation to your pipeline and that they're really flexible in what they can automate. You learned how to define and use tasks in SQL. And you also learned that you can do all of this in Python if you wanted to, using Snowflake's Python APIs. Coming up, we'll talk about how you might use multiple tasks in the context of a data pipeline and how you might chain them together to form an overall sequence of events.

2. Let's practice!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.