Get startedGet started for free

Populating column-oriented databases

1. Populating column-oriented databases

Now that we've jumped into the world of NoSQL, we'll spend some time exploring the ins-and-outs of column-oriented databases. First up, populating these NoSQL data stores.

2. Populating row-oriented vs. column-oriented databases

Populating row-oriented and columnar databases looks a little different. Row-oriented databases are typically used to store "transactional" data and perform well when inserting, updating, or deleting individual records. Since data is stored by row, only the data being inserted, updated, or deleted is impacted by these operations. Column-oriented databases, which are typically used for analytics workflows, perform well when loading, updating, or deleting data in bulk. For example, a Snowflake database used for market-basket analysis may be loaded, in-bulk, daily by Airflow by pulling all the records from the backend of a mobile ordering SQL database, which handles the frequent data insertions or deletions.

3. CREATE TABLE

Snowflake data warehouses consist of databases, which in turn contain schemas. Schemas house tables, where data is stored and organized using Snowflake's column-oriented architecture. Before we load a Snowflake table with data, we'll need to create a table. First, the table name is passed after the CREATE TABLE keywords. Then, in parenthesis, the column names and their associated types are defined, separating each pair with commas.

4. COPY INTO

Once a table has been created, it can be populated using Snowflake's COPY INTO command. Snowflake's COPY INTO functionality takes advantage of its columnar architecture to efficiently load data in bulk to a table, within a warehouse and schema, Snowflake supports loading data from a variety of sources, as well as parallel data loading. Behind the scenes, Snowflake automatically manages data distribution and optimization. In this example, data is loaded from a local file, data-science-books-dot-csv. In addition to loading files from a local machine, COPY INTO supports loading data from a cloud storage location, URL, or staged files. After the path of the file is provided, the format of the file is specified. The type of file and delimiter are provided, and the header is skipped. In addition to these three format parameters that were provided, there are many more that Snowflake offers to help customize the COPY INTO command.

5. CREATE TABLE ... AS

Sometimes, a Data Engineer may want to create and populate a new Snowflake table from a query against an existing Snowflake table. We can use Snowflake's CREATE TABLE ... AS syntax to do just that. Using CREATE TABLE ... AS is similar to using the CREATE TABLE function. A name for the table is provided, which will eventually create a new table in the current schema. In this example, we'll use CREATE TABLE ... AS to create a new table, called premium_books, which will contain all books with a price greater than fifty dollars. After the table name and the AS keyword, a SELECT query is used to populate the newly defined table with the data that is returned. This takes the place of defining column names and types, and the resulting table that is created takes the schema of the result set returned by the query. Sometimes, we'll want to re-populate an existing table. To do so, we can add OR REPLACE to the CREATE TABLE ... AS statement, as shown on the left. This will create a new table if one does not exist or replace the existing table with the results from the following SELECT statement.

6. Let's practice!

Awesome work! Let's practice creating and populating tables in Snowflake with some hands-on exercises. Best of luck!