Set up the data pipeline using Snowflake CLI

1. Set up the data pipeline using Snowflake CLI

Now that we've connected Snowflake to the GitHub repo, let's set up our pipeline. To complete this exercise, be sure you completed the instructions and the reading that precedes this video. We'll need to do two things. First, load the data for the pipeline, and second, build the pipeline objects using that data. To do this, we're going to use Snowflake CLI. What is Snowflake CLI? It's the command line interface for working with Snowflake. It is super powerful and can do many things with your Snowflake projects. For example, you can use it to start Snowpark objects, execute Snowflake notebooks and SQL directly from the command line, bootstrap a Snowflake native app, and much more. We're specifically going to use the CLI's snow git execute command. This command will execute a file or a series of files in a repository path. These files can be SQL files or Python files. Here's how we'll use it. We'll use this command to set up our data environment in Snowflake. We'll run this command against a directory that contains all of the data that needs to be loaded into our Snowflake account, and all of the pipeline objects built using that data. One really neat thing is that we're going to be able to pass in an argument to the command so that we can create staging and production data environments and corresponding data pipelines in each of those environments. We'll get more details on that setup in an upcoming exercise. Overall, this command is handy for quickly setting up data environments within Snowflake, among other things. For example, if you require that several Snowflake accounts have a standard data environment, you can use this command to execute a script that sets up the environment for those accounts. That's essentially what we'll use the command for. Okay, let's get started. Now's a good time to pause the video if you need to log into your Snowflake account. Let's start by first taking a look at the files we'll run. Navigate to the module one folder of the advanced data engineering Snowflake repo. Open the Hamburg weather folder. Expand the pipeline folder and locate the load tastybytes SQL file inside of the data subfolder. This file contains all of the code to create our data environments. You can see this file is parameterized, which you can recognize by the templating used here with the word env. In our CLI command, we'll pass in an argument that this file will interpolate throughout the code. This will help us easily create our two data environments. You'll see this in action shortly and it'll make more sense. Now, navigate to the objects folder. This folder contains all the objects for our pipeline. The objects will be created using the data in our data environments. You'll notice a couple of things. Objects are also parameterized. This is so that we can create pipelines in each of these data environments. Second, you'll notice objects are split up into folders. We'll learn about why we're doing this in an upcoming exercise. We'll run the entire pipeline folder using the CLI and pass in arguments to place the pipelines in their corresponding data environments. Finally, locate the app.py file in the streamlets folder. This file contains the code to visualize and share the output of the pipeline using a Streamlit and Snowflake app. Let's go ahead and build our pipeline. Now is a good time to pause the video if you need to open VS Code. In the terminal, start by typing snow git execute. This is the beginning of our command. Next, we'll pass in the path to the file we want to run. We want to set up our data environments first, so we'll pass in the path to the load tastybytes SQL file. Type at advanced data engineering snowflake. This represents the git repo object in our snowflake account, which of course is linked to the GitHub repo that you forked. Next, we need to specify both the branch and the path to the file. So now type slash branches slash main. Let's finish this up by typing the rest of the path to the file. Slash module one slash Hamburg underscore weather slash pipeline slash data slash load tastybytes.sql. Okay, don't run anything yet because we're not done with the command. If you did, you might encounter an error. Not to worry, you can simply start again. We need to pass in the variable that the file should interpolate. Type dash uppercase D followed by double quotes ENV equals single quotes staging close single quotes close double quotes. This will create a data environment that starts with the word staging. This way we'll know we're in the staging environment when we're in snowflake. Finally, specify the database and schema where this git repo lives. Type dash dash database equals course underscore repo and dash dash schema equals public. Okay, that's it. Let's run it. Give it a little while to process the command. Once it's done, navigate over to Snowsight and confirm that indeed the data environment was created. Great. Let's do this once more, but this time pass in prod as the environment argument. Great. Once again, run the command and give it a little to finish. You should see another success statement. Okay, you now have our data environments set up. Let's now build a pipeline in each of these environments. We'll follow the same pattern, but this time point to a new path in the repo containing the pipeline objects. Let's start with the same command. Now let's update the path. I'm going to hold down option on my Mac keyboard and click on the part I want to edit. I'll delete this part of the path backing up to pipeline. And now type the path to the objects folder. Make sure it reads objects slash. If I leave the path open ended like this, it'll run all files in this directory and its subdirectories. I'll also update my env variable to staging just to follow the same order that we did before. I'll hit enter and... It's done. It looks like everything was built successfully. Let's confirm these objects were indeed created in my staging tastybytes database. And yes, there they are. We also need to create the production pipeline. So now let's rerun this command and this time replace the word staging with the word prod. Once it's done, go ahead and confirm that the pipeline does indeed exist. Great job. A quick note, this isn't exactly the same pipeline as before, but it's mostly the same. The main difference is that we built two pipelines this time each one being in its own database environment. These two environments serve specific purposes. I briefly touched on this, but one database represents the staging environment and the other represents a production environment. The staging environment is intended for development purposes and testing. When used with source control, this pipeline provides a safe environment for introducing and testing new changes. After testing and confirming that introduced changes work as intended, the changes would be deployed to the production environment, the database that starts with prod. This pipeline represents the live pipeline being actively used for whatever end purpose. In this case, for building a Streamlit in Snowflake application. This entire workflow is part of a DevOps best practice known as continuous deployment. This practice lets data engineers safely introduce new changes and roll them out to the end user after confirming that they work properly. We'll dive into even more detail on this topic in an upcoming video. Okay, let's quickly finish up the pipeline with our Streamlit app. Navigate to the app.py file in the Streamlits folder. Copy the contents of the file. Next, navigate to Snowflake. Click on projects, then click on Streamlit. Create a new Streamlit app. Call it whatever you want. Select the staging environment as its location and select the public schema. Leave everything else as is. Click create. Wait for the app to boot up. Next, click edit. Delete all of the code in the editor and paste in the code we copied. Click run. Uh-oh, you should notice that the app loads, but it's not rendering anything. It looks like it's broken and maybe some data is missing. We're going to fix this by introducing a change that follows DevOps best practices. Okay, that's it. Now that we have our pipeline set up, along with a broken app that we need to fix, join me in the next video to start introducing changes to the pipeline.

2. Let's practice!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.