Automatic transformations with Dynamic Tables
1. Automatic transformations with Dynamic Tables
Earlier, we covered streams and you learned that streams give you an incredible amount of control over your data transformations. With streams, you can focus on processing incremental changes to an underlying table, but you can go even further by focusing on specific types of changes too. For example, if you wanted to focus on processing only delete operations against an underlying table, you could do that with a stream. But even with this fine-tuned control, you'll still need to make choices around what you'll do with the data tracked by the stream. And if you have a lot of streams, coupled with automation, this can get complex pretty quickly. Depending on what you're trying to do, a dynamic table might be a better option for you. A dynamic table in Snowflake is a special type of table that automatically updates itself with the latest data using a predefined query. This helps ensure that the table always reflects the most current information from its source tables. Unlike a stream, you don't need to write any additional logic to process the changes or types of changes to the underlying table. You can mostly just set up the dynamic table and move along, meaning you can get to a desired end state with your data really fast. It's also really easy to configure the freshness for a dynamic table. In one line of code, you can specify how often the dynamic table should be refreshed. For example, you could set the refresh rate for the dynamic table to immediate, which means that the table would update as soon as changes to the base tables are made. Or you could set it to, say, every 24 hours, because perhaps you want to aggregate some information over a 24-hour period. It's up to you. This flexibility is intended to help you meet your use case easily, without the overhead of managing scheduled operations. Let's create a dynamic table together so that you can get a feel for the pattern. In the previous video, we created a stored procedure that processed the order header stream. The stored procedure identifies whether any recent sales occurred in Hamburg, and if they did, aggregates the total amount of sales for that day and writes it to a table. The natural next step would be to automate that entire process, which we'll do in the next module. In this video, we'll create a dynamic table that does the exact same thing. Now, you might be wondering, why? Let me explain. Well, we're not really doing anything super complex with our stream. We're not processing any delete or update operations on the order header table. We're only processing insert operations to keep track of new orders. So although we can certainly keep the stream, and we will, it might be possible to simplify our architecture with a dynamic table. Second, the stored procedure writes the aggregated sales amount to a table. With a dynamic table, we're able to immediately specify where this data should be stored, so we can save a step here. And finally, dynamic tables make it easy to specify freshness. In one line of code, I'll specify that our table should update once every day. We haven't learned about tasks for automation yet, but with this dynamic table, we could likely replace our combination of streams, stored procedures, and tasks. Also, you should know how to implement both approaches. This way, you can decide which is best for your use case. In fact, we'll dive a little into the pros and cons of each towards the end. Okay, let's create our dynamic table. Now's a good time to pause the video if you need to log into your Snowflake account. Navigate to the Hamburg Sales dynamic table SQL file in the module 3 folder. Copy its contents and paste them into a new SQL worksheet. Run only the statements that set your context. Okay, let's now take a look at our dynamic table's definition. There are four main things to look at here. We use create or replace dynamic table to create the table. We specify the warehouse that should be used when populating the table. We set a target lag, which indicates how often this table should refresh. It's currently blank, but we'll set it shortly. And finally, we specify the query that should be used to populate the table. This is the same query from the stored procedure, only in SQL of course. For the purposes of this exercise, let's set the target lag to 1 minute. Now run the block of code that creates a dynamic table. Okay, great. It's been created. Let's take a look at what's inside. Run the next line of SQL that queries the dynamic table. Oh, look, there's a dummy data that we inserted earlier. That was fast. If you deleted that dummy data, you won't see anything in the table. Let's insert more dummy data. Run the insert into command. The only thing that we change in this dummy row is the date and sales amount. Everything else is the same, and that's fine. Now let's create the table again. Hmm, no changes yet. Oh, right. That's because we set the refresh rate to a minimum of 1 minute. You could always set it to immediate, but I want to show off a non-immediate refresh rate in action. Okay, let's wait. Now let's query it again by running the same line of SQL. There it is. A new sale for a new date. Let's test out our table once more. In the insert into statement, modify the order amount and order total field. You can set them to be anything. Keep everything else the same. Run the insert into command. Okay, now we have two sales for our date of March 9, 2024. Give it a minute and then query the dynamic table once more. Okay, no new rows, but we do see an updated total sales value for the date of March 9, 2024. That makes sense, since we're aggregating the total sales amount by date. How cool is that? By now, you've learned how to process incremental changes to data with streams and stored procedures. You've also learned how to use dynamic tables for automatic processing of changes to underlying data. Each approach has its own benefits and tradeoffs, and it's important that you learn how to implement both. The approach that you'll use for your data pipelines will, of course, depend on your use case. In practice, chances are that you'll have source tables with raw data being updated on a continuous basis, likely due to an automated process somewhere. If you need the fine-grained tracking that streams provide because your pipeline might perform complex logic based on those details, then streams are a great choice. Combining them with stored procedures and tasks powers them up even more by extending what you can do in your procedural logic and giving you more control over the refresh schedule. But maybe your pipeline doesn't need any of that. Maybe your base tables aren't likely to evolve significantly, meaning that overall, they're relatively stable. And maybe all you're interested in is specifying the target data freshness for your pipelines. In a scenario like this, dynamic tables are a great way of automating your transformations without the overhead of managing streams or tasks, which, again, we'll cover shortly. Okay, let's recap what you learned. You learned what a dynamic table is, what problems they help solve, how to define and use a dynamic table, and how to think about benefits and tradeoffs when evaluating streams against dynamic tables.2. Let's practice!
Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.