Data transformations with Snowpark
1. Data transformations with Snowpark
If you're like me, then SQL might not be your preferred language of choice. I personally prefer Python, and I love that with Snowpark, I can use Python to perform my data transformations. With Snowpark, you can perform data transformations in Python, Java, or Scala. Snowpark allows you to configure the runtimes for these languages, meaning you can perform data transformations with, say, Python 3.9 or 3.10, or whichever version fits your language and use case best. Snowpark provides a DataFrame API for processing your data and performing your data transformations. The general pattern is that you'll create DataFrames using your data, perform whatever sorts of transformations you need to perform, and usually write those results back to new views or tables. If you've used PySpark prior to this, or any other data framework with a DataFrame API, your Snowpark experience will feel very similar. Finally, if you've ever attempted to perform large-scale data operations using your own computer's memory, then you know that it's almost too easy, and also very frustrating, to run out of memory to perform those operations. This is a common limiting factor in large-scale data analysis using your machine's local computing power. But with Snowpark, if you're developing locally, you can push your data transformations down to Snowflake's powerful compute engine. This means you never have to worry about running out of memory on your own computer. Personally, this is my favorite part about Snowpark. When it comes to where you can write and execute Snowpark code, you have options. Much of your code could be written in a script or codebase that uses your preferred language, that is, Python, Java, or Scala. If you're specifically interested in using Snowpark for Python, though, you could also write and run your code directly within a Snowflake notebook, or in VS Code using the Snowflake extension. In this video, we'll perform the transformations that we performed earlier using SQL, but this time using Snowpark for Python. It's okay that we'll perform the same transformations. The point of the exercise is to get you familiar with Snowpark for Python and its DataFrame API. Follow along with me to get a feel for using Snowpark for Python. Feel free to pause the video now if you need to log into your Snowflake account. Navigate to Projects and click on Notebooks. At the top right, click on the dropdown next to the Add Notebook button. Click on Import ipynb File. Navigate to the Module 3 folder and select the Hamburg Sales Snowpark file. Name the notebook Hamburg Sales Snowpark. For Database, select Tasty Bytes, and for Schema, select Harmonize. Leave the Warehouse set to Compute Warehouse. Click Create. Okay, we're inside of a Snowflake notebook, which is an incredibly powerful tool for executing SQL, Python, and building data pipelines. We're going to write Snowpark for Python code in this notebook, and as I mentioned before, we'll run through the same transformations we did in the previous video, but we'll skip right to creating the view for the wind speed data in Hamburg, Germany, rather than walking through each exploratory query. Again, the point is for you to become familiar with Snowpark for Python and its DataFrame API. Our Snowflake notebook is going to allow us to run individual cells that will allow us to execute small chunks of the code at a time. This will help us keep our place as we perform the transformations. Let's start at the top. I'll use the package picker to install Snowpark for Python. I'll search for Snowpark and click on the result. Okay, great. Just like that, it's installed. Next, I'll start by importing the Snowpark functions I want to use and create a Snowpark session. You can see this code in the Start Snowpark Session cell. Click on the Play button at the top right. The `get_active_session()` function within Snowpark makes it easy for me to start a Snowflake session without needing to manually configure a connections file. And just like that, I'm off to the races. Next, we'll load the DailyWeatherView. We can call the table function on our session object and pass in the name of the view. Here I'm passing in the fully qualified name to the view, meaning I'm including the database and schema name in the path to the view. I also create a new object called FilteredWeather that filters for the specific city and dates. And here's the beauty of the DataFrame API. I'm calling the filter function and I'm passing the filter criteria in a pretty straightforward way. Okay, let's run the cell. Now we can move to aggregations and sorting in the next cell. In this cell, the DataFrame API allows me to easily group by multiple columns. In this case, the country name, city name, and dates. And I create aggregations by selecting the MaxWindSpeed measurement for these dates. And I name the resulting column MaxWindSpeed 100 M MPH using the alias function. I then sort these results by date and save that to a DataFrame object called SortedWeather. Run the cell. I can take a look at SortedWeather by calling the show function on it. I also pass in 30 as an argument, meaning I want to look at 30 rows in the DataFrame. Run the cell and nice. I can see the exact view that we created earlier using SQL, but in DataFrame format. Let's create a view from these results and run the final cell. This will call the create or replace view function on SortedWeather and accept the name for the view. I'll name the view WindSpeed Hamburg Snowpark. You can see it was successfully created. And you can refresh on the left-hand side as a quick sanity check. There it is. Great job. In this video, you learned quite a few things about Snowpark. Let's quickly recap them. First, you learned what Snowpark is and the type of flexibility and power that it can bring to your data transformations with its DataFrame API. You learned how to use a Snowflake notebook to perform the transformations. You also learned how to start a Snowpark session, how to load data into a DataFrame, how to perform transformations on DataFrame objects, and how to save results to a view. Coming up, you'll learn how to scale your transformations with reusable code in user-defined functions.2. Let's practice!
Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.