Cloning

1. Cloning

In Snowflake, when you clone an object, you’re creating a new version of it that is *zero-copy*. What this means is that at the time you create the clone, the clone relies on the same stored data – the same micropartitions – as the object you copied. Once you start making changes to the clone, then Snowflake keeps track of the changes, so your total data stored by virtue of having the clone will start to increase. But at the time of cloning, you won’t increase your storage amount. This is really cool for a number of reasons. One of my favorites is that cloning makes it very easy to test things – if you’re worried about messing up production data, you can near-instantly clone that dataset and use the clone instead, whereas if you had to create a whole new copy that involved recreating the underlying data, your total storage would be much higher and it would take longer. You can clone a lot of different Snowflake objects – [databases](https://docs.snowflake.com/en/sql-reference/sql/create-clone#syntax), schemas, tables, dynamic tables, and more – but we’re going to focus on databases, schemas, and tables. Let’s start by cloning a Tasty Bytes table: CREATE TABLE frostbyte_tasty_bytes.raw_pos.truck_clone CLONE frostbyte_tasty_bytes.raw_pos.truck; If this looks familiar to you, it’s because we’ve run this code before, except last time we called the table “truck_dev” instead of “truck_clone.” So what is this doing? It’s creating a table named truck_clone, in the raw_pos schema in the frostbyte_tasty_bytes database, and it’s specifying that it wants this to be a clone of the truck table. You might recall that every database has two automatically generated schemas – information_schema, and public. Information_schema provides a bunch of data about the database, including information on how much data is being stored in a view called TABLE_STORAGE_METRICS. Let’s check that out for both our cloned table, truck_clone, and the original, truck: SELECT * FROM FROSTBYTE_TASTY_BYTES.INFORMATION_SCHEMA.TABLE_STORAGE_METRICS WHERE TABLE_NAME \= 'TRUCK_CLONE' OR TABLE_NAME \= 'TRUCK'; You can see that they have different IDs – they’re different tables – but they have the same CLONE_GROUP_ID, which is the “[Unique](https://docs.snowflake.com/en/sql-reference/info-schema/table_storage_metrics#columns) identifier for the oldest clone ancestor of this table,” and you can see this matches the ID of the original table. And if you scroll further to the right, you can see that only the original table has a number for “ACTIVE_BYTES.” The cloned table has a 0 here. “ACTIVE_BYTES” are “[Bytes](https://docs.snowflake.com/en/sql-reference/info-schema/table_storage_metrics#columns) owned by (and billed to) this table.” You can get similar information from the TABLE view in the INFORMATION_SCHEMA, and this view updates more regularly, so let’s take a look at that: SELECT * FROM FROSTBYTE_TASTY_BYTES.INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME \= 'TRUCK_CLONE' OR TABLE_NAME \= 'TRUCK'; You can see that the bytes listed is the same for the clone as for the table that was cloned, but we know from what we just saw in the TABLE_STORAGE_METRICS view that truck_clone isn’t actually taking up extra storage. Now to get a deeper sense of how this clone works, let’s make a change to our clone, and see what happens: We’ll do this by doubling the table size using an “insert into” command: INSERT INTO frostbyte_tasty_bytes.raw_pos.truck_clone SELECT * FROM frostbyte_tasty_bytes.raw_pos.truck; Basically we’re just taking the whole original table, and appending it to the clone, so now every one of the clone’s rows has a duplicate. Okay, so let’s check out our storage again from the TABLES view, since this updates right away: SELECT * FROM FROSTBYTE_TASTY_BYTES.INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME \= 'TRUCK_CLONE' OR TABLE_NAME \= 'TRUCK'; Cool, so the storage has gone up for the clone, and the storage remains unchanged for the original table, as you’d expect. If we waited a little while, we could check the TABLE_STORAGE_METRICS view and see that the “active bytes” number has gone up from zero in our clone. The syntax for creating clones is similar no matter what object you’re working with, so now that we’ve done this for tables, we can move pretty quickly through schemas and databases. Let’s clone our RAW_POS schema in FROST_BYTE_TASTY_BYTES: CREATE OR REPLACE SCHEMA frostbyte_tasty_bytes.raw_pos_clone CLONE frostbyte_tasty_bytes.raw_pos; If we refresh our menu on schemas on the left, we can see that raw_pos_clone is now visible. Let’s now do the same thing for a database: CREATE OR REPLACE DATABASE frostbyte_tasty_bytes_clone CLONE frostbyte_tasty_bytes; My understanding is this takes a bit longer because, even though it’s zero-copy for the data itself, there’s a lot of metadata to copy over. And again, if we refresh, we see the clone database – frostbyte_tasty_bytes_clone. We won’t do this here, but I also wanted to mention that you can clone a clone! You can get as deep as you want – clones of clones of clones. Snowflake takes care of managing all the underlying micropartitions so you don’t have to worry about that. The last thing I wanted to cover on cloning is that you can combine time travel and cloning – So you can clone an object as of its state at a past date, as long as that date is within the retention period. Let’s do that – Let’s clone our original truck table as of ten minutes ago. (There’s nothing special about this, because we haven’t changed the original truck table, but we can still see how the syntax works.) CREATE OR REPLACE TABLE frostbyte_tasty_bytes.raw_pos.truck_clone_time_travel CLONE frostbyte_tasty_bytes.raw_pos.truck AT(OFFSET \=> -60*10); It looks like it worked! SELECT * FROM frostbyte_tasty_bytes.raw_pos.truck_clone_time_travel; In this case, not that interesting because the data was the same as if we’d copied it in its current state, but a powerful tool when you need it. That’s it for cloning! To recap, we learned that cloning is zero-copy, and Snowflake tracks subsequent changes. We learned how to clone tables, schemas, and databases. We learned how to combine time travel and cloning. We learned how to use the INFORMATION_SCHEMA to check out the clone_group_id and the active_bytes columns in the TABLE_STORAGE_METRICS view. We also learned how to check out the bytes column in the TABLE view. Like time travel, cloning can be a little mind-bending. Soon we’ll return to solid ground by tackling exceptionally practical topics like resource monitors.

2. Let's practice!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.