You've probably done some data engineering in the past

1. You've probably done some data engineering in the past

One thing that I love about data engineering is that its concepts are present in what might seem like everyday data tasks. If you've ever done anything with data in the past, there's a very good chance that you've performed data engineering or some aspect of it without even knowing it. For example, if you've ever used a spreadsheet to work with data, you've likely used the ingestion, transformation, and delivery framework. Say you had some raw data in a CSV file and you import it into a program like Excel or Google Sheets. When you did that, you performed data ingestion, and perhaps in the process of loading the data, you configured certain options like specifying the data format, the delimiter, and more. Once the raw data was loaded, you probably looked over the data and started thinking about how you might extract certain insights from it. Say, for example, you wanted to analyze a certain aspect of that raw data, like sales data, but you needed to perform some work against the data before being able to extract that insight. If there were any columns that you didn't need, maybe you deleted them. If there were some values that needed to be in, say, decimal format versus whole numbers, you probably made that change as well. And maybe you needed to perform some calculations against existing columns to derive new columns. And there's even a chance that you had to create a single data set by combining multiple data sets. That entire process represents data transformation, where you took raw data and transformed it such that you would get closer to your insight. Most folks might stop there at a final polished data set, in which case that would constitute the delivery of that data product, in that case the data set. But maybe you took it a step further. Maybe you created pivot tables based on certain dimensions, or you decided to visualize certain aspects of the transformed data using a histogram, for example. And last but not least, maybe you needed to share those final insights with someone, like a teammate or a client. Or maybe that polished data became part of a larger analysis by another team. That entire process is data delivery, where your final data product was delivered to a consumer, like yourself or your team, or to another data system. And sure, maybe the scale was small. You had a few hundred rows or so, and maybe things were very manual, and you personally performed all of those steps on a daily or weekly basis, or something like that. But the point is, you were actually building data pipelines no matter how small the scale was, or how manual the process was. And the amazing thing is that all the steps that you took can be contextualized by the ingestion, transformation, and delivery framework that we'll use in this course to build data pipelines. What happens though when the scale of data increases 100, 1,000, or even 1 million fold? Or when new data sources are introduced that must also be looked at? What if there are requirements to keep data fresh and insights current, such that your manual weekly process now needs to happen on an hourly basis? And what if your computer doesn't have the computing power to handle the processing that you need to perform against that data? You can imagine how those things would introduce some serious challenges into extracting insights from your raw data. Which is why, in this course, you'll learn to use Snowflake to solve challenges like these, and build scalable, end-to-end, continuous data pipelines. With that, let's get you set up with your Snowflake development environment.

2. Let's practice!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.