What is orchestration?
1. What is orchestration?
There's so much more to understanding data pipelines beyond the three phases in our data engineering framework. In fact, if someone were to ask me, what's the one key aspect of building a data pipeline that completely levels up the power and robustness of that pipeline, I'd certainly answer automation. We'll dive into this concept in this module. Before getting into the details, here's a quick note on the ITD framework that we've used so far. This framework has made it easy for us to understand the core moving parts to a data pipeline. Thanks to the framework, we know that the core components of a data pipeline involve ingesting data, performing transformations against that data, and delivering a data product to a consumer. You've also learned how to use the most practical, powerful features in each of those phases to get you up and running fast with building data pipelines with Snowflake. With the exception of dynamic tables, much of what we've done so far has had little to no automation in it. Ingesting data has involved manual steps and coordination, and so did our data transformations and our data delivery. We also ran our SQL and Python scripts manually. All of that was by design. We learned a lot by doing it this way. This approach helped us get hands-on with Snowflake to get a feel for what the code and platform feel like when building pipelines. But automation is a key concept that breathes life into a data pipeline. It can take a pipeline from feeling like an overly hands-on manual process to a continuous machine with its own running engine. When building a robust data pipeline, automation underpins all aspects of the pipeline. So what exactly can be automated within a pipeline to really level it up? Well, just about anything, really. And that's where the beauty and power of automation lies. It's very common, and in fact, many times the norm, to automate things like ingestion processes. For example, it's very common to automate the ingestion of data files into Snowflake from cloud object storage. That `COPY INTO` command that we used earlier? You can automate it so that it runs, say, weekly, daily, or hourly. You might also automate transformations. It's very common to write complex transformation logic within a stored procedure and then automate that stored procedure to execute after new data has been ingested, as an example. Another common powerful technique is to automate the processing of a stream so that transformations and aggregations can happen instantly when the underlying data has been updated. You also know that dynamic tables can help with automation because you can specify a refresh rate for the table. The great thing about automating these sorts of things is that data products downstream get all of the benefits. For example, rather than repackage or rebuild applications on a daily basis because the underlying data has changed, automation can help ensure that the data products you're delivering to the application are fresh and up to date. The same is true for, say, a machine learning model. Perhaps you want to make sure the machine learning model doesn't drift, so you retrain it on a weekly or monthly basis, and you use automation in some way to help you maintain the model with fresh data. Those are just a couple of examples of what's possible by adding automation into your pipelines. The opportunities are truly boundless. One last thing before we kick things off. Why is this module called orchestration if we're talking about all things automation? Well, I like to think of orchestration as automation at scale. With so many moving parts that can be automated, the name of the game quickly goes from automating one specific thing to figuring out how to harmoniously orchestrate the automation of hundreds of different things. You'll find that to be true in practice as well. In this module, we'll cover a couple of the most important and powerful automation techniques with Snowflake. The first, tasks. Tasks are the magic behind automation. We'll specifically cover user-managed tasks, but I'll also lightly touch on serverless tasks. We'll also cover how to chain together tasks for broader automation. Let's get started.2. Let's practice!
Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.