Loading data using the COPY INTO command
1. Loading data using the COPY INTO command
So far, our batch ingestion techniques have been no code or low code, which is pretty neat. To quickly recap, we've pulled in data from the Snowflake Marketplace, performed ingestion of local data files using the web interface, and performed ingestion of data using the command line interface. Using these techniques, you've seen firsthand how easy it is to get data into Snowflake. Let's now dive into one of the most common and powerful techniques for batch data ingestion, Snowflake's `COPY INTO` command. The `COPY INTO` command is a SQL command that allows you to ingest data from stage files into a table. When I say stage files, I simply mean files that live in a stage. A stage is where a file can be stored before its data is ingested into Snowflake. The stage can be a Snowflake internal stage or an external stage that points to third-party cloud storage. For example, you can create an external stage in Snowflake and specify that the stage should point to, say, an AWS S3 bucket, for example. In fact, ingesting data from cloud object storage using the `COPY INTO` command is a very common pattern for batch ingestion, and this is the technique we'll cover in this video. The general gist involves these main steps. First, define the table that you want to load data into if you haven't already done so. If you do have an existing table that you want to load data into, you wouldn't necessarily have to do this step, assuming the table schema matches the data that you're loading. Next, define a file format to use during ingestion. You might define a specific file format object if you have many options that you want to configure for the ingestion, things like the FIO delimiter, type formatting, trimming whitespace, and much more. Otherwise, you're probably okay with defining the file format inline and simply specifying the format of the data files you're working with, such as CSV, JSON, or Parquet. You'll get hands-on experience with this shortly. Next, you define an external stage. When defining your external stage, you'll specify the address to your cloud object storage, like your AWS S3 bucket, and any credentials needed to access the bucket. Finally, write the `COPY INTO` command and put everything you've defined together. You'll specify the table you'll load the data into, the stage containing the data files, and the file format to use during ingestion. You can also specify what should happen in case errors are encountered during ingestion using the onError option, which is the same option you saw when loading data using the web interface earlier, or if you browse a SQL file that we used to load data using the Snowflake CLI. You'd write and execute all this in a Snowflake worksheet. In fact, let's get hands-on and walk through the process together. Pause the video now if you need to log into your Snowflake account. Let's start by defining a table that we'll load data into. Navigate to the sample menu SQL file in the Module 2 folder of the repo. Copy the first 18 or so lines of code and paste them into a new SQL worksheet. In these lines of code, I create a table called sampleMenuCopyInto and specify the column definitions and their corresponding data types. Run the file. We'll now type the rest of the code into the worksheet manually so that you can get some practice with some important concepts. Note, however, that the sampleMenu SQL file contains all of the code for this exercise in case you simply want to paste the entire file into the worksheet and run it. Let's now create our stage. Type the following. create or replace stage load underscore data dot public dot blob underscore stage url equals apostrophes s3 colon slash slash sf quickstarts slash tastybytes slash file underscore format equals type equals csv semicolon This creates a stage called blob stage, and the URL parameter points to the address of an AWS S3 bucket containing the CSV files. You'll note that I'm not specifying credentials for the bucket here, and that's intentional. For the purposes of convenience, we've made this bucket publicly accessible so that you can follow along. But the best practice is to secure your bucket and authenticate by passing in the correct credentials. Run this block of SQL. Okay, great. The stage is created. We can quickly take a peek at what's in the bucket. Type list at load data dot public dot blob stage slash raw underscore POS slash menu slash semicolon. This will list the files in the raw POS menu folder of the S3 bucket. In the results pane, you'll see that there's a compressed CSV file there called menu dot csv dot gz. We're going to load this data into our table. Now let's write the `COPY INTO` command. Type copy into. Load data. Dot public. Dot sample menu. Copy into. From. At load data dot public dot blob stage slash raw POS slash menu slash semicolon. I specify the table to write to and the file within the external stage. Let's run this command. Okay, I see in the results pane a status of loaded and 100 rows that were loaded. Let's sanity check ourselves. Type. Select. Star from. Load underscore data. Dot public. Dot sample menu. Copy into. Run this SQL. You should get the 100 rows shown in the results pane. And that's it. That's how to use a `COPY INTO` command for batch ingestion of data files in cloud object storage. We did all of this by piecing together the various components needed to use a `COPY INTO` command. The target table, the stage, and the file format. If you browse the corresponding Snowflake documentation, you'll learn that you can actually do all this in a single command. But I demonstrated each piece so that you can see how a `COPY INTO` command is typically constructed. Here are a couple of best practices and considerations to keep in mind when using copy into. Ensure your data is ready to be loaded, meaning you should have an idea of the structure of your data and how it will map to a table in Snowflake. Create your target tables ahead of time. This will make the process a little more efficient. And finally, this is an extremely common pattern for batch ingestion. It's common to use this for initial environment setups, but also very common as part of scheduled loading of large amounts of data into Snowflake. We'll cover how to automate this workflow in a later module. But it's good to get exposure to this concept now in a manual fashion so you can build up to more robust automated pipelines. So far, we've loaded samples of data, mostly for the purposes of focusing on the method of ingestion rather than the nature of the data. But moving forward, this will change. Be sure to complete the reading for this module. Using the instructions in the reading, you'll load a bunch of data related to food truck sales. We'll use that data in subsequent exercises. Closely follow the instructions in there to be sure you're set up with the data needed for the rest of the course. OK, coming up, I'll talk about how to ingest data into Snowflake from other data systems using connectors.2. Let's practice!
Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.