DE - Snowpipe - Part I
1. DE - Snowpipe - Part I
Okay, so in a previous video, we talked briefly about how you can ingest data with Snowpipe – if you’re using Snowpipe streaming, this can happen in seconds, and if you’re using regular Snowpipe, this can happen in under a minute. Here we’re going to test out regular Snowpipe with auto_ingest \= TRUE to see how to get this working in practice. Let’s get into it. Okay, we’re going to start all of this off in AWS because if we don’t have data to ingest, then Snowpipe is not very interesting. And I’m picking AWS just because lots of people are familiar with S3, but I should note that this works with cloud storage in GCP and Azure as well. For a long time, I was intimidated by the idea of logging directly into the AWS console. My data experience was mostly at big companies that created layers of abstraction between me and most third-party services, including the big cloud providers. But my worrying was mostly not justified. You could go create an account today and start messing around, and you’ll find that it’s much less intimidating than you might have thought. At least, that’s been true for me. Okay, so here’s what I’ve already done – I’ve already created an S3 bucket called “intro-to-snowflake-snowpipe.” And I’ve put one file in that bucket, called “food.csv.” It’s just got two columns, one that lists a food name, and the other that lists a tastiness score. I feel fine not having walked through those steps with you because I think if you needed to, you wouldn’t have too much trouble creating an S3 bucket and getting a file in there. What might be more challenging is getting all the permissions set up in your AWS account and Snowflake so that the two can communicate – at least, it was for me. So here’s how we do this. We want to make sure Snowflake has Identity and Access Management (IAM) policy permission to access your S3 bucket with GetObject, GetObjectVersion, ListBucket, and GetBucketLocation. So let’s navigate to the IAM service in our AWS console. We do this by typing “IAM” in the search bar. Then we click on the first option. We need to go to Account settings, and then confirm that the Security Token Service lists our account’s region as Active. In this case, that’s US West (Oregon), so we’re good. Cool, so now we click on “Policies” on the left hand side of the screen under “Access Management,” and then we click “Create Policy.” Then we create a new IAM policy named “snowflake_access,” and fill it with the following: { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:GetObject", "s3:GetObjectVersion" ], "Resource": "arn:aws:s3:::intro-to-snowflake-snowpipe/*" }, { "Effect": "Allow", "Action": [ "s3:ListBucket", "s3:GetBucketLocation" ], "Resource": "arn:aws:s3:::intro-to-snowflake-snowpipe", "Condition": { "StringLike": { "s3:prefix": [ "*" ] } } } ] } You can see that this is a step along the way to giving Snowflake permission to do four different things – It’s being allowed to GetObject and GetObjectVersion for the files in our bucket (that asterisk means the permissions apply to all files in that bucket) and it’s being allowed to ListBucket and GetBucketLocation for the “intro-to-snowflake-snowpipe” S3 bucket itself. Okay, so now we click “next.” Let’s make “snowflake_access” the policy name. So now we’ve created an access policy, but right now it’s just sitting there, not actually associated with anything. To get it to have force, we need to create an IAM role to apply it to, and then we need to associate that role with our Snowflake account. So let’s go to ROLES on the left hand side of the IAM, and select “CREATE ROLE.” Select the “AWS ACCOUNT” radio button. Select “This account” down below, and then require an external ID. (I’m just going to put 0000.) Next we select that we want to associate our “snowflake_access” permissions policy with this role. Then we name the role – I’ll call this “snowflake_role_snowpipe” Then we create the role. At this point you might be wondering: “Peter, this is not a course on AWS. Why are we doing all this?” At least, I’ve thought this same thing in other courses I’ve taken. It’s because, in my experience, the most challenging part of getting Snowpipe set up so you can ingest data is making sure you have the right permissions for it to access your external data. The rest is pretty easy. And a lot of that connecting has to be done through the console of your web provider – in this case, AWS. I don’t expect you to absorb all of this, but I wanted to make sure you’ve seen it at least once so that you have a sense of the steps involved. Okay, so now it’s finally time to hop into Snowsight. Now we get to create an integration that connects Snowflake to our S3 bucket! Very exciting. We haven’t talked about the Snowflake integration object so far in this course, but the idea is pretty straightforward: It’s an “object that provides an interface between Snowflake and third-party services.” We do that with the “CREATE STORAGE INTEGRATION” command: CREATE OR REPLACE STORAGE INTEGRATION S3_role_integration TYPE \= EXTERNAL_STAGE STORAGE_PROVIDER \= S3 ENABLED \= TRUE STORAGE_AWS_ROLE_ARN \= "arn:aws:iam::849350360261:role/snowflake_role_snowpipe" STORAGE_ALLOWED_LOCATIONS \= ("s3://intro-to-snowflake-snowpipe/"); We set the type as “EXTERNAL_STAGE,” we set the storage_provider as “S3,” etc. The two key parts – and the parts that might trip you up – are that we need to give the role ARN (ARN stands for “Amazon Resource Names” – they’re unique identifiers in AWS land) and we need to specify the bucket we want to give access to. The bucket part’s not that hard: Here it’s s3://intro-to-snowflake-snowpipe/ And the ARN is not hard to find if you know where to look. Just go to the role you created, and at the top of the screen you’ll see ARN and a little “copy” icon you can click to copy that. Cool, now we run that. Okay, so now the Snowflake integration knows something about AWS – we specified the ARN of the AWS role we want to use, and we specified the S3 bucket – but so far, we haven’t told AWS about Snowflake. And you can imagine that when two services are reaching out to each other, they’re both naturally suspicious, and they both need to know something about the other. It’s not enough for one to reach out to shake hands – they both have to feel comfortable reaching out to shake hands. So this is the moment we finally give some information about Snowflake back to AWS so AWS can feel comfortable permitting this connection. We find out the info we need to hand over by running the “DESCRIBE INTEGRATION” command in Snowflake, followed by the integration name: DESCRIBE INTEGRATION S3_role_integration; We take the STORAGE_AWS_IAM_USER_ARN, and we copy that. Then we go back into the AWS console and go to our role… and if you had to guess which of the things you can see we click on to get AWS to trust Snowflake, which do you think it is?... That’s right, “Trust Relationships” and then “Edit trust policy.” Then we replace the existing AWS_IAM_USER_ARN with the one we pulled from Snowflake. And we copy the STORAGE_AWS_EXTERNAL_ID from Snowflake, and replace the “sts:ExternalID” with that. { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::975050217095:user/4xhj0000-s" }, "Action": "sts:AssumeRole", "Condition": { "StringEquals": { "sts:ExternalId": "WCB79566_SFCRole=2_KvL/lAQKLGDsikzubebFvlvV4og=" } } } ] } Okay, so just to recap – It felt like we just did a lot, but actually we did basically four things: We created an access policy in AWS, then we created a role that’s associated with that policy, then we created an integration in Snowflake and made it aware of our AWS role, and then we edited the AWS role to make it aware of our Snowflake integration. So basically, we hosted a dinner party where we were like: “Steve, meet Li; Li, meet Steve.” In the next video, we’ll actually go ahead and create our Snowpipe. Let’s do it!2. Let's practice!
Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.