Stages and Basic Ingestion - Part I
1. Stages and Basic Ingestion - Part I
We now know a bunch about virtual warehouses, so let’s put our knowledge to use to ingest some data! If you have data that you want to load to a table in Snowflake, you do that by way of an intermediate object called a stage. Data on your local machine? You’ll first create a stage object. Data in external cloud storage? You’ll first create a stage object. Only after making the stage object do you copy the data to a table. The stage serves as a bridge between your data source and your table. In this course, we’re not going to get into *why* stages exist. We’re just going to learn how to use them. Earlier, we played around with Menu data from Tasty Bytes, the food truck company. Now we’re going to ingest a *lot* more Tasty Bytes data– order data, customer data, truck data, and more – and we’ll use that again and again throughout the course. Okay, so in this worksheet, I’ll start by making sure I’m using the super-powerful role of accountadmin. We won’t talk about roles until later in the course, but for now you should know that what I’m doing here is *not* a best practice – You don’t typically just casually use the all-powerful accountadmin role. But in this course we’re not going to worry about it, especially not before we cover roles. Next I’ll create a database and a few different schemas. (We’ll talk about databases and schemas in the next video.) The one thing I’ll point out is that we have two raw schemas, and then a cleaned-up schema called “Harmonized,” and finally our tippity-top schema called “Analytics.” Next we’ll create two warehouses – One just for loading our data, and one for doing some analysis work later. Note that the “demo_build_wh” that we’re using to load our data is a TRIPLE XL!!! Whenever you use a triple XL warehouse, you should tremble a little bit, because it’s REALLY powerful. That’s 64 times as powerful as our standard X-small warehouse, and burns 64 times the credits per hour. It’s like a giant laser cannon. Super powerful, but also a big responsibility. Here, we’ll use it very briefly to load the TastyBytes data from an S3 bucket, and then we’ll drop it and never use it again. Okay, so then we create a very simple csv file format. Don’t worry about this. It’s not critical at the moment. And finally we get to the focus of this video – Creating our stage! Let’s run this and talk about what we did. So there are two kinds of stages -- external, and internal. Here we’ve got an external stage, and we can tell because you can see it’s pulling from an S3 bucket. The key thing to know about external stages is the data they’re connected to is *not* managed by Snowflake. This means Snowflake isn’t responsible for controlling access to that data, you don’t get billed through Snowflake for the storage of that data, etc. This means that when creating an external stage, we’ll always add a reference to an external cloud data storage location – AWS S3, or storage in Google Cloud or Azure – plus credentials for accessing that bucket if necessary. (Here credentials weren’t necessary.) If you don’t see a url referencing external storage, then it’s not an external stage. Internal stages are different. Snowflake *does* manage the cloud storage for internal stages, and that means Snowflake takes care of security, Snowflake manages the associated cloud storage billing, etc. So if you saw a stage creation command that *didn’t* have a reference to an external cloud data source, you’d know it’s internal. It would look something like this: CREATE OR REPLACE STAGE frostbyte_tasty_bytes.public.internal_stage_test; No url. No access credentials. We’re not really going to cover file formats in this course – I’ll just mention that they give Snowflake information about the kind of file you’re about to ingest. But I do want to note that you can use file formats with both internal and external stages, even though we’re only seeing it next to the external stage. To recap, in this video, we explained what a Snowflake stage object is, we created an external stage and uploaded files to it using the CREATE STAGE command, and we explained the difference between an external and internal stage Next, we’ll learn about the three different kinds of internal stages, and then we’ll actually load files from a stage into an existing table!2. Let's practice!
Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.