Declarative approach with CREATE OR ALTER
1. Declarative approach with CREATE OR ALTER
Now that you know a bit about imperative and declarative approaches for DCM, let's dive into Snowflake's declarative approach functionality, the create or alter command. You might be familiar with Snowflake's create or replace command, which will create an object if it doesn't exist, that's the create part, or will drop an object and recreate it if it does exist, that's the or replace part. The create or alter command is a little different. Create or alter allows you to create an object if it doesn't exist, or modify it in place by applying only new changes to that object without dropping the object. It's pretty neat, right? This means we can easily use a declarative approach to manage objects in Snowflake. And when we couple this with Snowflake's Git integration for source control, we have an approach that allows us to not only easily modify and evolve object definitions over time, but to also keep track of all changes. All of this is strongly in line with DevOps best practices. Here's how you would typically use a command. You define a new object using create or alter. This is the initial definition of the object. Say weeks go by, and now you need to change the definition of the object. You'd simply locate the initial definition and make the new changes there. When you run the command, Snowflake will do the hard work of recognizing only the new changes and applying them for you. This is great because you can incrementally update objects when business needs require them to evolve. What I find even neater is that any associations on that data are preserved. I specifically mean that things like tag associations, attached policies, and privilege grants on those objects. Those are not affected and simply remain intact. Now, be careful using this command. It's so powerful that you can also introduce changes like removing object properties, which could affect data. For example, if a modification to a create or alter table statement calls for dropping a column, any data that was in that column will also be dropped. You can, of course, recover that data using Snowflake's time travel feature, but I call this out to help you be mindful when wielding this powerful statement. And finally, if you run this command and it turns out that the object already matches the definition, then the object will simply remain unchanged. Let's get hands-on with our declarative approach and use this command in our pipeline. Now's a good time to pause the video if you need to log into your Snowflake account. Okay, let's start by creating a couple of branches on GitHub. Start by navigating to your forked repo on GitHub. Click the branches dropdown and click view all branches. Next, click new branch. Type in staging. Click create new branch. Okay, great. This branch represents work that we'll do against our staging data environment, which we called staging tastybytes. Anything merged into the staging branch will deploy into this data environment. We'll pull this branch into Snowflake shortly. Let's create another branch. Click new branch once more. And now create a branch called fix missing data. Make sure you select staging from the dropdown at the bottom. This will ensure that the fix missing data branch is branched off of the staging branch. If you don't see it, you may need to refresh your browser. The flow that we want to follow to introduce changes into our pipeline is create a feature branch off of staging to introduce changes. Merge our feature branch into staging. This will deploy our changes into our staging environment. Finally, merge from staging into main. This will deploy our changes into the production environment. Don't worry about the details of how this will work just yet. We'll implement all this in the next exercise. All right, we're going to use a Snowflake notebook to do our development work. The cell-based development environment will be perfect for helping us incrementally test code and introduce fixes. This will help us make sure that our changes work as expected before pushing them up to the repo for review. In Snowflake, click on projects, then notebooks. At the top, click on the dropdown and click create from repository. This will open a modal that will allow you to name the notebook and select the notebook file from within a repository. Name the notebook Hamburg pipeline. Next, click on select IPy and B file. This will open a modal for you to select the file. We must first specify the database where the repo lives. Click select database and select course repo. Click on the repo we have connected. Now, let's pull in the branches that we created on GitHub. Click on the fetch button to bring in the branches. Next, click on branch and set the branch to fix missing data. Remember, this is the feature branch we want to do our development work on. Next, locate the notebook file. You'll find it inside of module one, Hamburg weather, notebooks. Click on the IPy and B file. With it highlighted, click select file. Next, complete the rest of the modal. For notebook location, select staging Tasty Bytes and select the public schema. Leave the remaining settings as they are. Click create. This will create and open the notebook for you on the feature branch. Okay, so what exactly are we going to do? We're going to fix our broken pipeline using DevOps best practices. We'll do some exploratory work on this feature branch against our staging environment. We'll introduce and test some changes out, and when we're ready, we'll apply those changes to the object definition in the source code and push the changes up for review. In the following videos, we'll implement automated deployments, but for now, let's dive back in and fix our pipeline. So what might be wrong with the app? I remember that it rendered perfectly well, but didn't show any data, so I suspect there's some data missing. Let's check things out. Start by running the first cell that sets our context. This helps us make sure we're working against our staging environment. Next, let's take a look at the tables that contain the raw data. These are in the raw POS schema. Run the next cell. Okay, yes, it looks like there are two tables that are missing data, country and daily sales Hamburg. I know that daily sales Hamburg derives its data from the raw tables, so I suspect the reason it doesn't have any data is because the country table doesn't have any data in it, and if we're analyzing food truck sales for cities around the world, I imagine that country data is critical to the pipeline, so let's fix this. Let's take a look at the load tasty bites SQL file in VS Code. It looks like the table is created just fine, but why doesn't it have data? Scrolling down, I see that all the data loading is at the bottom of the file. Aha, it looks like the data loading for this table was commented out, so the data was never loaded into the table. That makes sense. Let's run that in the notebook. I'll copy it, paste it in a new SQL cell, and uncomment it. I'll replace env with the word staging. You could do this programmatically, but for two lines, it's not worth the effort to me. Let's run the cell. Uh-oh, there's an error. It looks like we're trying to load more columns than we defined. Maybe that's why it was commented out, so let's fix this. Let's go back and look at the load tasty bites SQL file in VS Code. Copy the table definition for the country table and paste it into a new SQL cell. Let's do two things. First, I'll rebuild this table in declarative fashion. Edit the first line to be create or alter. Update the env word to staging. Run the cell to update the table, and that's it. That's all it takes to manage an object declaratively in Snowflake. How easy was that? Next, I'll add the missing column. I'll type city underscore ID, number, parentheses, 19 comma zero, close parentheses. Be sure to add this at the end of the column list. Run the cell once more. Since it's declaratively managed now, it's easy to add a column in without needing to drop and recreate the table. Okay, that worked. There's, of course, still no data in it. Let's run our previous cell again to load data into it. Okay, great. It loaded data. Let's check our app out. Open the editor. Make sure the env environment is set to staging. You can see we templatize here as well with Python FString. Run the code and there it is. The app is fixed. Looks like our changes work. Great job. This pattern of writing code, running code, and verifying the code is quite normal. It's all part of development work. This helps us make sure that we push up a pull request that we're confident does what we say it does. It also helps our colleagues to test and reproduce our code in their environments. Okay, now I feel really confident about my code. Let's create a pull request. Open VS Code. In the terminal, run git fetch. Next, type git switch fix missing data. This will switch us to the feature branch that we created earlier. To confirm, type git branch. Okay, we're on the correct branch. Open the load tastybytes SQL file. Let's add our solution here. We can simply copy the work from the notebook. Copy the table definition and paste it into the file. Make sure we re-templatize here. And let's also uncomment the `COPY INTO` command. Save your changes. Run git status. And we see that our file was modified. Next, run git add minus p. This will let me see what changes we made to the file as a sanity check. This looks correct, so I'll type y and hit enter. This also looks correct, so I'll do the same thing. Okay, let's commit our work. I'll type git commit minus m fix missing data. Let's now push up our branch. Type git push origin fix missing data. Okay, great, it was pushed up. Let's go to our forked repo on GitHub to create the pull request. You'll notice a message about recent changes on the branch. Click on the button here to compare changes and start the pull request. Be careful about how you create the request. Make sure that the repo base represents your fork. After doing that, make sure you select the staging branch as the base branch. The UI should show that you want to merge the fixed missing data branch into staging. Add a title, feel free to add more context, and then create the request. Okay, let's stop here for now and pick up in the next couple of videos. Great job. We did a lot in this exercise. We identified the issue with our pipeline, declaratively created the country table using create or alter, and loaded data into the table. This fixed our broken pipeline and faulty application. We also did all of our work on a feature branch and used a notebook to help us feel confident about our proposed changes. Finally, we pushed up the changes as a pull request for consideration, and now you're on your way to implementing DevOps best practices. With that, join me in the next video to learn about continuous integration and continuous delivery for data pipelines.2. Let's practice!
Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.