Get startedGet started for free

Source control in Snowflake with git

1. Source control in Snowflake with git

By now, you know that a key tenant of DevOps practices is source control. In this exercise, we'll implement source control for our pipeline by connecting Snowflake to GitHub using Snowflake's Git integration. In the next video, we'll build the pipeline. Thankfully, we won't need to build it from scratch, as all of the files that we'll need to set up the pipeline are in the GitHub repository that we'll link to. Let's go ahead and link the repo to our Snowflake account. This is a good time to pause the video if you need to log into your GitHub account. Start by navigating to the Advanced Data Engineering Snowflake repo on GitHub. This is the repo that will connect to Snowflake. It also contains files we'll need for this exercise and for subsequent exercises. To connect this repo to Snowflake, start by creating a fork of the repo. This will create a copy of the repo in your GitHub account, which will allow you to write to it. Navigate to the top here and click on Fork. You can leave everything else as is on this page and now create the fork. Okay, great. You now own a copy of the repo in your account. Before we proceed, I want you to know that forking the repo is the most important step in the module. If you skip forking, you won't be able to follow along in the rest of the module. Okay, with our fork ready, let's briefly cover how we'll connect our Snowflake account to the GitHub repo. There are three things you need to do to connect your account to the repo. The first is to configure authentication to access GitHub. All this means is creating a personal access token, or PAT, within GitHub and using it as a credential within Snowflake so that Snowflake can communicate with GitHub. The second is to configure an API integration within your Snowflake account. This integration will use the PAT and specify which URLs your account can interact with. Finally, we need to create a Git repository object within Snowflake. This object essentially acts like an external stage. Only this time it points to a GitHub repo. We'll be able to access the files within the repo directly from our Snowflake account thanks to the API integration we defined. Okay, let's do this. Now is a good time to pause the video if you need to log into your Snowflake account. Navigate to Snowflake and create a new SQL worksheet. Next, navigate to the API integration SQL file inside of the Module 1 folder. Copy the contents of the file and paste them into the SQL worksheet. Set your context by running the first three lines. This file has several SQL statements that we need to complete. Let's start by creating the secret. The secret requires our GitHub username and a personal access token, which we can generate in GitHub. Start by entering your GitHub username here. Next, we'll need to generate a personal access token from GitHub and use it here as a password. This is so that we can programmatically access GitHub. Navigate to GitHub. Click on your profile and then click on Settings. On the left, click on Developer Settings. Click on Personal Access Tokens, then click on Find Grained Token. Click Generate a New Token. Let's add a name for our token. I'll call it Snowflake Git Integration Token. Under Repository Access, click on All Repositories. Expand Repository Permissions and scroll down to Contents. Select Read and Write. This will ensure that we'll be able to read and write to the repo that we'll connect. Finally, click on Generate Token at the bottom. A token should be generated for you. You should treat this token as a password and keep it secure. Also, once you've generated a token, GitHub will only show it to you once on this page. If you close the page or forget to write it down, you won't be able to access the token again, and you'll need to repeat this process again. So be mindful of this token. Okay, copy the token to the clipboard. Next, navigate back to Snowflake and paste it in as a password. Run the statement. Okay, great. We've created a secret. Let's now complete the API integration. We have some initial statements stubbed out that we need to fill in. Start by giving the integration a name. I'll call it Git API Integration. It's missing values for API Allowed Prefixes and Allowed Authentication Secrets fields, so let's add those. For the first field, add the URL to your GitHub profile. All this means is that our API integration should allow us to access any resources under this specific URL, which represents our GitHub user account. This is perfect because if you check, you'll notice that the path to the fork repo lives as a resource under this URL. To find the URL, navigate to GitHub. Click on your profile picture at the top right, then click on Your Profile. Copy the URL in the address bar and paste it into the API Allowed Prefixes field. Next, specify the secret that we just defined. Type GitHub underscore PAT. That's it. Let's run the statement. And there you have it. We now have an API integration defined. For more details on API integrations, be sure to check out the relevant Snowflake documentation, as there are other possible values for the API Provider field here. Finally, let's complete our Git repo object. We'll need to specify the values for the three missing fields. For the API integration, specify our Git API integration. For origin, specify the path to the GitHub repo that you forked. Let me show you where to grab this in GitHub. Navigate to GitHub. Click on Your Repositories. You should see the forked repo there. Click on it. Next, click on Code. Under Code, copy the address. Navigate back to Snowflake and paste it as a value for origin. Okay, we're almost done. All we have to do now is specify the credentials to access this repo on GitHub. Here, we specify the secret we defined earlier. I'm using the fully qualified path of course-repo-public-github-pat. That's it. Let's run the statement, and if successful, you should see this output. Great. We now have a GitHub repo connected to our account. You can list all the connected repos by running the show-git-repositories command. There it is. To verify, you can navigate to the Object Explorer. Click on the database and schema, and you should see the GitHub repo listed as an object. If you click on it, you should be able to browse the files that are in the repo directly within Snowflake. How cool is that? Okay, with this connection, we can now use source control for our work in Snowflake. We can also easily run files from the repo directly from Snowflake. We're well on our way to implementing some DevOps best practices for our pipeline. Great job. Join me in the next video as we use this integration to set up the pipeline in your Snowflake account.

2. Let's practice!

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.