Batch ingestion with Snowflake

1. Batch ingestion with Snowflake

Data ingestion with Snowflake can be broadly categorized into two buckets, batch ingestion and streaming ingestion. Batch ingestion refers to the process of ingesting data into Snowflake in large, discrete chunks. This batch ingestion typically occurs at scheduled intervals, but it doesn't have to. It can also be a one-time event. Batch ingestion is commonly used in scenarios where real time processing of data is not critical, and it allows for the efficient handling of large volumes of data. For example, batch ingestion is commonly used in scenarios where you may be migrating data from one system to, say, Snowflake, or where you're setting up a Snowflake environment for the first time, or if your pipeline ingests large amounts of data on a daily schedule, like an overnight job, as an example. Streaming ingestion, on the other hand, refers to the continuous and real-time ingestion of data into Snowflake. Unlike batch ingestion, which handles ingesting data in large, periodic chunks, streaming ingestion deals with data piece by piece, virtually instantaneously. Streaming ingestion is critical for use cases that require immediate analysis and action based on the latest data, like financial trading or real-time monitoring of, say, instrumentation or equipment. In this course, we're going to cover batch ingestion with Snowflake. Streaming ingestion will be covered in detail in a different course designed to be the follow-up to this course. Batch ingestion with Snowflake is generally file-based, meaning you'll ingest data that you have in files. This process typically involves three steps. First, preparing your data files. This means preparing your CSV, JSON, or Parquet files, for example. Second, staging or storing those files somewhere. A very common pattern is staging data files in cloud object storage, like in an AWS S3 bucket, for example, and then ingesting the data in those files directly from the bucket into Snowflake. But this is just one of many patterns commonly seen. It's also common to have files stored on local computers, other data systems, and more. Finally, you'll need to actually perform the data ingestion. Storing the data somewhere is one thing, but performing the ingestion means actually bringing the data into Snowflake. And there are several options here, from Snowflake's easy-to-use web interface, to patterns that use SQL or Python for ingestion, as an example. So to summarize, the typical pattern for batch ingestion with Snowflake involves preparing files, staging the files, and then performing the data ingestion into Snowflake. There are a few different ways for performing batch ingestion with Snowflake, and here are the techniques that we'll cover in this module. First, loading data from the Snowflake marketplace, where you can discover and quickly load high-quality datasets directly into your Snowflake account. Loading data using Snowflake's web interface. I'll cover how to load data using Snowflake's UI, which, by the way, is also known as Snowsight, just in case you hear me use that term. I'll cover ingestion using Snowflake's CLI, specifically meaning loading data from your local computer into Snowflake using Snowflake's command-line interface. And finally, I'll cover one of the most common and powerful techniques for ingesting data, using the `COPY INTO` SQL command to load data from files in cloud object storage into Snowflake. We'll cover each of these techniques individually, but chances are your approach when building a data pipeline may include one or more of these methods, all working in concert to ingest data. Let's start with loading data from Snowflake's marketplace.

2. Let's practice!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.