Recap and best practices for batch ingestion

1. Recap and best practices for batch ingestion

We've covered a range of common and powerful techniques for performing batch ingestion with Snowflake. The techniques have ranged from no-code options that allow you to quickly load data into your Snowflake account, to techniques that allow you to ingest data from a local file system or from cloud storage. Let's quickly recap the techniques that we covered in this module. Loading data from Snowflake Marketplace. Batch ingestion using Snowflake's web interface. Batch ingestion with Snowflake CLI. Batch ingestion using the `COPY INTO` command in cloud object storage. How you might load data using Snowflake Connectors. You also learned how to make the most out of Snowflake Compute Resources virtual warehouses so that you're optimally utilizing them when performing batch ingestion. This is an especially useful concept that will level you up as you build your data pipelines. All of these ingestion techniques play important and unique roles in building data pipelines. But as always, your use case will vary. I mentioned this earlier in the course, but the intent of this module is to present you with some of the most common and powerful batch ingestion techniques. These techniques, of course, don't comprise an exhaustive list. But with these core techniques in your tool belt, you're well-equipped to explore other techniques or think about how to combine them based on what you're trying to do as part of your data pipeline. Finally, let's recap some best practices to keep in mind when using these techniques. First, 100 to 250 megabytes is the ideal file size range when loading data into Snowflake. This size is optimally efficient for Snowflake's virtual warehouses. You should split files into smaller chunks that fall within this range whenever possible. You can use handy command line utilities to help you do this. Compress your file formats whenever possible. Snowflake can handle the decompression for you when you load those compressed files. Always seek to understand the total number of threads available to you in a virtual warehouse. This way, you can aim to fully utilize the warehouse when loading your data. Remember, you can use the size of the warehouse in combination with the number of nodes in the cluster to figure out the total number of threads available to you. In general, I recommend starting with an extra small virtual warehouse and scaling up only as necessary. If you don't specify a size when creating a new virtual warehouse, Snowflake will default it to an extra small. There's a lot of detailed technical guidance on the web on virtual warehouses. I encourage you to do some more research on this topic if you're interested in learning more about how they work. Okay, with these ingestion techniques now in your tool belt, you might be wondering what's next? Well, chances are the raw data you've ingested into Snowflake still needs a lot of work on it before it can provide the insights you're looking to extract. So join me in the next module to dive into data transformations in Snowflake.

2. Let's practice!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.