Implementing continuous delivery for our data pipeline
1. Implementing continuous delivery for our data pipeline
Let's take our DevOps practices full circle for our pipeline and actually implement continuous delivery. For our pipeline, continuous delivery will mean introducing changes that are source controlled. We'll use Git and GitHub for this. It'll also mean deploying and testing changes out in a development environment in Snowflake. This would happen before deploying those changes to our production environment, which is also in Snowflake. We'll also automate the deployment of these changes to either environment using a third-party tool. In our case, we'll use GitHub Actions, but you should know that there are several other popular options out there that can do what GitHub Actions does. And finally, we'll use tools to help speed up continuous delivery. In addition to GitHub Actions, we'll use Snowflake CLI to deploy changes to our Snowflake environments. Before diving in, let's quickly reorient ourselves. Remember that the last change we made was to create a table declaratively using create or alter. This helped us fix our pipeline. We did all of this on a feature branch called fix missing data and pushed it up to create a pull request. In this exercise, we'll set up the infrastructure to help us implement continuous delivery. We'll use GitHub Actions and configure it to deploy changes to our staging and production environments on certain triggers. The general workflow will be we'll push changes to GitHub using Snowflake CLI, and GitHub Actions will be able to recognize a push and deploy the changes to the specified environment. Sounds pretty cool, right? So what is GitHub Actions? It's a CI/CD platform by GitHub. It allows us to automate our builds, tests, and our deployments. It does this by following a set of instructions that we define within GitHub Actions. It's a very common tool for automating these workflows in engineering teams. Let's configure GitHub Actions to help us automate our deployments to Snowflake. Start by navigating to our forked repo on GitHub. Click on Actions. This is where we can configure an action for our repo. Click on the green button indicating that you understand your workflows. Click on New Workflow. Click on Set Up a Workflow Yourself. You'll be taken to a file editor. GitHub is doing some work for us here. It's creating the directory and the file that will hold the instructions that our GitHub Action needs to follow. Notice that it creates a .github-workflows directory and places a file in there called main.yaml. You can name the file whatever you want. I'm going to leave mine as is for now. So what should we write in here? Well, workflows in GitHub Actions can be customized quite extensively. So to stay on the right track, I've written some of the workflow for us. Navigate to the Module 1 folder in our repo and locate the main.yaml file. It's in the Hamburg weather folder. Copy its contents and paste it into the main.yaml file that GitHub created for us. Let's quickly take a look at what this does. At the top, we specify that this workflow should trigger when there is a push or a pull request that is merged to either the staging or main branches. Main represents our production branch in this case. For this workflow to be able to deploy into our Snowflake account, it needs credentials to our account. So we specify them as GitHub secrets, which we'll set up shortly. We then set environment variables that are used by the workflow based on which branch is merged to or pushed to. These variables match the environments that we have set up in our Snowflake account. This is so that when the workflow deploys our changes into Snowflake, it knows which environment to deploy to. The workflow installs Snowflake CLI and then fetches the latest changes in our repo. Finally, the workflow calls snow git execute. Remember that you used this command earlier to set up our data environments. As you can see, it'll execute the data directory in the pipeline folder. This is where you could configure what you want to execute. Let's complete the file before we commit our work. On line 64, this statement is configured to deploy the correct data environment based on what branch we're on. But it's missing the information it needs. Delete fixme and type $open curly brace github underscore ref underscore name close curly brace. This will interpolate the value of github ref name into the directory here. And github ref name is a GitHub actions shortcut for the name of the branch. On the next line, we need to do something similar and set the environment variable. Type env dot deploy underscore env. You can see at the top that deploy env is set based on the target branch we're on. If it's the staging branch we're on, it's set to staging. If it's on main, it's set to prod. So let's save our work. Edit the changes and select the option to create a new branch in the process. This way we can first put this workflow onto the staging branch, then later we can put it on main. Create the pull request. Be sure to select staging as your base branch. Then proceed to merge it. Repeat the process by merging this change into main. Create a new pull request. This time, select main as your base branch and merge staging into it. Proceed to merge it. Now we have our workflow on both branches. That's it. Our workflow is now set up. Our changes are now on GitHub and we need to sync our Git repo object in Snowflake to reflect the latest changes. Navigate to Snowflake and locate the Git repo object. Click the Fetch button. Great. Now we're synced to what we have on GitHub. To get details on when this action runs, navigate to Actions in GitHub. Oh, look at this. This entry indicates that the workflow attempted to run but encountered an error. You'll see here that it's broken because of missing credentials. So let's set that up. Click on Settings. Next, click on Secrets and Variables on the left-hand side. Then click on Actions. Our workflow is looking for a secret called snowflake-user and snowflake-password. So let's create these. Start by creating snowflake-user and input your Snowflake username. Save the secret. Now create another secret for snowflake-password and paste in the password to the account. If you've configured your local `config.toml` file correctly, as indicated in a previous reading, then you can find this info there. Enter the password and save the secret. Let's create one more secret for our account identifier. Call it snowflake-account. For the secret, paste in the account identifier. You can grab this from your local `config.toml` file or from within Snowflake. Save the secret. Okay, that's it. Let's try out our action. Navigate to the pull requests. Merge in the pull request. Remember that it's merging into staging. Let's now go to GitHub Actions and see what's happening. Navigate to Actions. Here you'll see the action running. Click on Deploy Data Environment. You'll see everything that is being run as part of the action. These are all the instructions that we specified in the main.yaml file. We'll need to wait for this entire process to be successful before we can go into Snowflake to observe the deployed changes. What's cool is that if this fails, you'll also observe the failures here. Okay, looks like it was successful. Amazing! Since we pushed our changes to the staging environment, we should navigate to our corresponding staging database in Snowflake, which represents that environment. Let's go ahead and confirm that this object was indeed created. And yes, those are our changes from earlier. Let's now imagine that because this change looks good, we create a new pull request on GitHub against the main branch for a teammate to review. Navigate to your fork repo and then click on Pull Requests. Create a new pull request. We want to introduce our changes into production, so let's set our branches accordingly. Select the fork repo as the base, then select main as a branch. For compare, select staging. So it should be configured to show that staging is attempting to merge into main. Add any relevant details below and click Create Pull Request. Once the pull request or PR has been created, you would typically tag a teammate here for a review. We're not collaborating with a team here, so let's just imagine you're also the approver. So let's go ahead and approve the changes and merge them in. Navigate to Actions once more and click on a running workflow. Click the one that's running against the main branch. Let's see if this is successful. Yes, it was. Let's confirm in Snowflake. We'll need to check in the production environment this time. There it is. All right, you've implemented continuous delivery of changes to our pipeline. This will help us verify changes in a safe and collaborative way before they're rolled out to production. This is cutting edge stuff that you're doing. Join me in the next video to recap what we've learned about DevOps with Snowflake.2. Let's practice!
Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.