Snowflake Architecture Overview
1. Snowflake Architecture Overview
Now that we’ve covered many of the core Snowflake objects, I think we’re ready to zoom out for a moment and talk about the overall Snowflake architecture. This video will differ a little bit from the others in that we’ll mostly go over slides that will help us visualize the different parts of Snowflake’s architecture. But don’t worry, it’s worth it. Maybe you’ll even finish this video feeling like you just saw a picture of earth from space for the first time. If you get even 10% of the way to that level of awe, I’ll take it. Okay, so here are the critical things to know about Snowflake’s architecture. First off, it has four layers, as you can see here: optimized storage, elastic multi-cluster compute, cloud services, and Snowgrid. You’ll notice that storage is separated from compute, which is something Snowflake pioneered – For people who have been in the data world for a while, this is a really big deal, and will probably never stop feeling like a big deal – When I was a kid, my family drank powdered milk because it was cheaper, and even though I’ve only had regular milk for years, regular milk still feels special to me. I think this is probably how a bunch of people feel about the separation of storage and compute. Back in the day, when you had to scale up both at the same time, it was a major bottleneck to your workloads. I suspect those who experienced this will never forget those scars and never stop being grateful for the era we live in now. But I came of age in the data world more recently, and for me, this separation of storage and compute is something I’ve come to expect. In any case, let’s go through each of these layers one by one. The first layer is optimized storage. It lets you access your data all in one place. You can store structured data, high-volume semi-structured JSON data, even unstructured data like PDFs or images. This layer is built on blob storage, which is great because that means you don’t have to migrate your data as it grows. Snowflake manages the data – so for example, it automatically controls the micro-partitioning, it takes care of compression, it takes care of encryption. And the storage is flexible, so you can use the architecture you want. You can connect with data on-premises, data stored in open table formats like Apache Iceberg, etc. The second layer is Elastic Multi-Cluster Compute. As I mentioned earlier, Snowflake separates storage and compute. Both are also very scalable – at Snowflake, we describe them as “near-infinitely” scalable. To give you a sense of this, one Snowflake customer has a table with over 140 TRILLION rows, and there’s a customer who’s executed more than 160,000 queries in a 1-minute interval. Another cool aspect of Snowflake is that we also separate compute from compute, which means we can have multiple clusters operate on the same data without resource contention. We saw this earlier when we learned about virtual warehouses. We’ll talk about this more later when we get to Snowpark, but Snowflake also lets you work in multiple languages – SQL, Python, or Java. The third layer is cloud services, which manages a lot of important tasks. For example, it pushes upgrades automatically, and does it in such a way that you don’t have to worry about migrations or experience downtime. Snowflake engineers constantly improve the performance of the platform, and this carries over to your Snowflake experience without you having to do anything extra. The cloud services layer also manages files and file metadata, enabling ACID transactions, query result caching, time-travel and zero-copy cloning (which we’ll learn about later), and high concurrency. The fourth layer is Snowgrid, which lets businesses connect across regions and clouds. And as we mentioned at the beginning of the course, Snowflake works on Amazon Web Services, Microsoft Azure, and Google Cloud. Snowgrid makes data, services, and apps accessible. These can be distributed between teams, business units, partners and customers. And it removes the need for extra ETL, ELT, and integrations. You’re also able to maintain business continuity cross-region and cross-cloud. You can replicate and synchronize databases, accounts, pipelines, and more. Snowgrid allows for resiliency, durability, and failover by choice, and it even lets you migrate between clouds as needed. Okay, so that’s it for our discussion of Snowflake’s architecture! To recap, in this video we covered Snowflake’s four architectural layers – Storage, Compute, Cloud Services, and Snowgrid. We talked about how Snowflake’s storage lets you access your data all in one place, whether it’s structured, semi-structured, or unstructured. We talked about how Snowflake’s compute is near-infinitely scalable. We talked about how the cloud services layer manages files and file metadata, enabling things like time-travel and zero-copy cloning. And we talked about how Snowgrid makes it easy to work across regions and clouds. If this sounds abstract, just imagine meeting a database admin from the year 2000 who time traveled to the present day and was introduced to Snowflake. I’m pretty sure their eyes would go wide, and that would make all of this feel very concrete.2. Let's practice!
Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.