DE - Snowpipe - Part II

1. DE - Snowpipe - Part II

We’ve learned how to create an access policy in AWS, how to create a role associated with that policy, how to create an integration in Snowflake and make it aware of our AWS role, and how to edit the AWS role to make it aware of our Snowflake integration. In this video, we’ll actually create a Snowpipe and see it in action. Let’s do it. First let’s create a new database and a table in that database, and let’s make sure we’re using the public schema in our new database: CREATE OR REPLACE DATABASE S3_db; CREATE OR REPLACE TABLE S3_table(food STRING, taste INT); USE SCHEMA S3_db.public; Okay, now we’re back to our old friend, the stage. We set the url to our S3 bucket, and instead of putting a password or something like that in there, we pass our integration object to the stage. Remember, that object is trusted by AWS, so the stage should be all set with the permissions it needs to pull data from S3. CREATE OR REPLACE STAGE S3_stage url \= ('s3://intro-to-snowflake-snowpipe/') storage_integration \= S3_role_integration; Let’s take a look at our stage by running SHOW STAGES SHOW STAGES; Looks good. Pretty straightforward. Okay, now the moment of truth – Let’s see if there’s anything in our stage! If our stage successfully connected to S3, it should have our food.csv file in there. If not, then we’ve got to troubleshoot. LIST @S3_stage; It worked! There’s food.csv. Food, glorious food. And just to double check, let’s query the stage directly with: SELECT $1, $2 FROM @S3_stage; (I don’t think we covered this when we learned about stages a long way back, but dollar-sign-1 pulls the first column from a stage, and dollar-sign-2 pulls the second column from a stage.) Cool! We’re in business!!! If we were being careful here, we’d create a new role, and give it all the necessary privileges to manage our Snowpipe (we’d grant usage on our database and schema, we’d grant insert on the table we created called “S3_table,” we’d grant usage on the stage we created called “S3_stage,” we’d grant ownership of the pipe, and we’d grant the role to our username). But for the sake of speed, we’re going to live dangerously and run this next step with ACCOUNTADMIN: CREATE PIPE S3_db.public.S3_pipe AUTO_INGEST=TRUE as COPY INTO S3_db.public.S3_table FROM @S3_db.public.S3_stage; Let’s slow down here for a moment – We’re creating our pipe, and we’re calling that pipe “S3_pipe” (and we’re putting it into the S3_db database, and the public schema). Then we’re copying that into our S3_table from our S3_stage. This style of syntax should look very familiar at this point – Just like we create functions and databases and tables with “CREATE” plus the object name, we create our pipe with CREATE PIPE. And because we specified auto_ingest \= True, we don’t have to manually specify any other method for ingestion (like calling a REST API, etc.) So this is nice and easy. Okay, so we run that, and if we query our table S3_table, it should be empty right now, because even though we have data in our S3 bucket, the way we set this up, our Snowpipe will only pick up data in our bucket when there’s a change to the S3 bucket – like a file gets uploaded. So it will miss the data that’s in there presently. Let’s confirm: SELECT * FROM S3_db.public.S3_table; Yep, that doesn’t return anything. Now let’s go over to our S3 bucket and upload a file. (We’ll just upload a copy of our food.csv data). Then if we wait about a minute… and run this query again… SELECT * FROM S3_db.public.S3_table; Now we see that there’s data there! And we didn’t have to do anything manual, or set up any additional orchestration. Because of auto_ingest, Snowpipe recognized that there was an upload file event in our S3 bucket, and it pulled the new file into our table through our stage. Sweet. Okay, so let’s take a closer look at our pipe by running the SHOW PIPES command: SHOW PIPES; We can see when it was created, its name, the database and schema it’s in, its definition, etc. Pretty cool. In this case, we get the exact same types of information if we run DESCRIBE PIPE, followed by the name of our pipe: DESCRIBE PIPE S3_db.public.S3_pipe; SHOW PIPES and DESCRIBE PIPE yield the same columns. If you want to change the property of a pipe, you can use the ALTER PIPE command. Let’s pause our pipe by setting PIPE_EXECUTION_PAUSED equal to TRUE. ALTER PIPE S3_db.public.S3_pipe SET PIPE_EXECUTION_PAUSED \= TRUE; Awesome. And then, as you might expect, we can drop the pipe with DROP PIPE: DROP PIPE S3_pipe; And if we run SHOW PIPES again, we’ll see that our good old reliable pipe is gone: SHOW PIPES; Farewell, dear pipe. That might have felt like a whirlwind – that’s fine! I did not expect you to absorb everything we just covered. Here’s what I hope you did take away: One, in order to use Snowpipe with cloud storage, you need to create a Snowflake integration object Two, in order to make the integration object work, you need to give Snowflake the appropriate credentials to communicate with AWS, and AWS the appropriate credentials to communicate with Snowflake Three, the DESCRIBE INTEGRATION command allows you to see the info you need to enter into your AWS account for it to trust Snowflake Four, you can create a pipe with the CREATE PIPE command. Five, if you set the auto_ingest property to true for your pipe, it will automatically check to see if your cloud storage location has had new files uploaded to it, and it will pull those into your destination table Six, you can show all of your pipes with SHOW PIPES Seven, you can get metadata about a given pipe with DESCRIBE PIPE, followed by the name of the pipe Eight, you can drop a pipe with DROP PIPE, followed by the name of the pipe Awesome! That’s a lot. If you’re wildly confused, you should definitely take some of our Snowflake coursework on Data Engineering. And if you’re like “I’ve got this,” and want to move your skills to the next level, you should still take some of our Snowflake coursework on Data Engineering.

2. Let's practice!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.