Get startedGet started for free

Data transformations with SQL

1. Data transformations with SQL

We're going to be building a pipeline that tracks anomalous sales data and daily sales metrics for specific cities where Tasty Bytes operates. You've already loaded all of the raw data into your Snowflake account necessary to do this. If you haven't, ensure you complete the corresponding reading in the previous module. With the raw data now in Snowflake, we can begin transforming it. In this video, you'll learn how to use SQL to perform those transformations. Before getting into the details, let's go over a couple of things. First, this isn't a SQL course, so I won't focus on teaching you SQL syntax. But you also don't need to be a SQL expert to follow along. What I'd like for you to focus on is how and where transformations might be performed. You can always look up SQL syntax independently. Second, this isn't a data modeling course. We won't cover the theory behind how to structure or model your data. The goal is simply for you to learn just enough to competently perform transformations and then do them on your own, in your pipelines. Finally, Snowflake supports the common, standardized version of ANSI SQL, but Snowflake also has its own SQL dialect, with slight variations that are extremely handy. This is mostly a heads up in case you ever come across SQL in Snowflake that might look different from SQL that you might be used to. Alright, let's get started with some transformations. It's a good time to pause the video and make sure you're logged into your Snowflake account. We've learned from sales analysts on the Tasty Bytes team that sales in the city of Hamburg, Germany mysteriously dropped to zero for nearly the whole month of February. We've been tasked with wrangling the data and figuring out what may have happened. Let's use the raw data we've loaded and perform some transformations to pinpoint the exact cause. Start by creating a new SQL worksheet in your Snowflake account. Next, open the Hamburg sales SQL file in the Module 3 folder of the repo. Copy the contents of the file and paste them into the SQL worksheet you just created. Don't run the file just yet. Let's set our context by running only the first three lines of code. Okay, great. Let's start performing transformations to figure out the root cause that affected sales. Let's figure out the exact dates in February when sales dropped to zero in Hamburg. This first block of SQL will help us figure this out. Let's go ahead and run it. Okay, it looks like sales were zero for February 1 through February 24. Naturally, the next question is, why? Could weather have impacted sales? Let's keep digging. To start, we'll create a view that adds weather data to the cities where Tasty Bytes operates. The weather data is pulled from the Marketplace weather dataset that you loaded earlier. Run the second block of SQL, which creates a view called Daily Weather V in the Harmonized schema. Excellent. Let's dig some more. I wonder, what was the temperature on those days? Could it have been too unbearable to go outside on those days? Maybe that impacted sales. The third block of SQL will help us determine this. It'll return the temperature in Hamburg for those specific dates. Run this block of SQL. Hmm. The average temperature for Hamburg in February tends to fall between 30 and 40 degrees Fahrenheit. At first glance, the temperature column doesn't seem to show anything too out of the ordinary. I'll click on Chart here to quickly visualize the results, and yeah, pretty standard. Could it have been something else? Let's run the final block of SQL. This block will return the wind speed on those dates. Wow. On some of these days, the wind speeds reach nearly 70 miles an hour. I'll click on Chart once more to quickly visualize the results, and yeah, we see that on average, the wind speed approached either tropical storm or hurricane-force winds. I definitely wouldn't go outside in conditions like that, not even on Taco Tuesday. Okay, it seems we might want to keep track of the wind speed in Hamburg. Run the last block of SQL, which will create a view that does this. We'll use this view to keep an eye on things and alert our team members in the future. Great job. We used SQL to perform some transformations against our data to help pinpoint an issue. We also created a view with valuable insights that we can share later. We worked with a couple of datasets, and the transformations were relatively straightforward. But in practice, you may be tasked with performing transformations across tens, hundreds, maybe even thousands of tables. And you may be working with several team members to do this. So keep in mind that the types of transformations that you'll perform will depend heavily on your use case. One final note before we move on to performing transformations in other languages. Note that we left our raw data completely untouched. Instead, we apply transformations and materialize the results into new objects, like our view. This helps us to reuse the raw data as much as possible without compromising its integrity. In this video, you learned how to perform data transformations against your raw data in SQL. And if SQL isn't your thing, then I'm hoping Python, Java, or Scala might be. In the next lesson, you'll learn how to use Snowpark to perform transformations in these other languages.

2. Let's practice!

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.