Database Change Management (DCM)
1. Database Change Management (DCM)
Let's now imagine that we need to introduce changes to our pipeline. How do we do that in a way that minimizes risk to the end user? We'll do this through a combination of source control and using a declarative approach for managing database changes. Before implementing this approach, let's understand a couple of common approaches to managing database changes. The practice of managing changes to database objects is cleverly known as database change management, or DCM. The terms database schema migration and schema change management are synonymous as well. My colleague, Jeremiah Hansen, has what I feel is a clear and comprehensive definition of this practice. He says, database change management, DCM, is a practice of defining all database objects in code in a Git repository, and then deploying those objects, including changes to those objects, to a database with the help of a dedicated tool. This exercise will cover the first half of that definition, namely defining objects in code and using Git to implement source control. We'll cover the deployment of those objects in upcoming videos. So let's dive deeper into database change management. Managing a database comes with all sorts of challenges, like managing the state and evolution of objects within it. I'm talking about changes like introducing new tables, adding columns to existing tables, modifying schemas, and much more. How exactly do data engineers handle these kinds of challenges? One of the most common ways to manage the state of database objects is through an imperative approach. An imperative approach handles changes to database objects in a step-by-step fashion and allows data engineers to carefully apply or remove changes to those database objects. Here's what I mean. Let's consider an object, like a table for example, that goes from state A to state D. Maybe it's getting new columns added, removed, or something else in it is being modified. With an imperative approach, it's the norm to have multiple scripts that gradually evolve the table to the desired end state. For example, I might apply script 01.sql to morph the table in a certain way, followed by script 02.sql, and then finally script 03.sql. And after applying script 03.sql, I might stand up and celebrate. This imperative approach is a common way of managing database state, but it also comes with its own challenges. For example, this process can be error-prone because data engineers have to make sure that scripts are applied in the correct sequence. Without a tool to help you keep track of this, this can quickly become confusing. This approach can also be cumbersome. In my previous example, I mentioned 3 scripts to get a table from state A to state D, but in practice, you may have tens, maybe hundreds of scripts to apply. And again, you have to make sure they're applied in the correct order, so evolving objects may be more time-intensive and nerve-wracking than previously thought. This approach also forces that work onto a human versus, say, software or a tool, which I'll get into more in a bit. And last but not least, if you have multiple databases, you now need to maintain the version or state of each database, and frankly, that can mean a lot of state management. So as you can imagine, all of this can quickly feel like a challenging undertaking. So what's the alternative? These challenges can be addressed using a declarative approach to database change management. With a declarative approach, you declare objects once and update their definition as it evolves, typically in a single source-controlled file. This means fewer files to maintain, along with a complete history of changes to the object, thanks to source control, making this a lot less error-prone. And there isn't necessarily a sequence to follow either, so you don't have to worry about making sure that multiple scripts are executed in the correct order. Now, of course, you may have other important files that also use a declarative approach, but the practice of running several scripts to evolve a single object is greatly minimized if not completely gone. This approach means we can introduce changes, safely test them out in an environment of our choice, collaborate with teammates on those changes, and keep track of it all. This also means we can minimize risks to the pipeline, and therefore the end user, all while maintaining DevOps best practices. So which approach is better, imperative or declarative? There isn't necessarily an approach that is better than the other. The approach you pick is going to depend on what fits your data environment and team the best. You may already be using an imperative or declarative tool for your approach. If it works for you and your team, then great, you should stick with it. In practice, we see that teams may use a combination of both approaches to meet their needs, and you most likely will need to use multiple tools to manage all Snowflake objects. We also see that teams augment their approach with third-party tools like SchemaChange or LiquidBase to help them with DCM. I won't go into detail on the various possible combinations of tools for these approaches in this video, but I am including a reading in this module that provides some more guidance. Be sure to check it out to get a more holistic view of approaches to DCM with Snowflake. In the context of DevOps, the declarative approach lends itself much better to DevOps practices for data pipelines, and this is the approach that we'll employ in our pipeline. Join me in the next video to get hands-on with our declarative approach with Snowflake.2. Let's practice!
Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.