Snowpark Dataframes - Part I
1. Snowpark Dataframes - Part I
When I was learning to cook, someone told me the secret to a good soup is that you don't stir it constantly. You set it up, you let it simmer, and you only check on it when you actually need to know something. I thought about that a lot when I first learned about Snowpark DataFrames, because Snowpark DataFrames work the same way. You set up your transformation, you walk away, and nothing actually happens until you explicitly ask for a result. Chefs call it patience. Snowflake calls it lazy evaluation. Either way, the soup is better for it. Let's get into it. We spent a good chunk of this module learning about SQL features in Snowflake, like time travel, cloning, UDFs, and stored procedures. Now, we're going to shift gears and talk about something that opens up a whole different way of working with data in Snowflake: Snowpark DataFrames. Snowpark is Snowflake's set of non-SQL capabilities. We mentioned it briefly earlier in the course. DataFrames are one of the most central parts of it. The idea is simple. Instead of writing SQL to transform your data, you write Python, and Snowflake runs that Python logic inside its own compute engine right next to your data, without you having to move anything. If you've used a Pandas DataFrame before, Snowpark's DataFrame will feel familiar. The key difference is that a Pandas DataFrame lives in your local memory, while a Snowpark DataFrame doesn't. It's a reference to a transformation that will run in Snowflake when you're ready. That distinction matters a lot when your data has 600 million rows and your laptop has 16 gigabytes of RAM. How does that look in practice though? Let's get into it. We're going to work in Snowflake Notebooks today. If you used an earlier version of this course, you may have seen Snowpark demoed in a Python Worksheet. Notebooks are now the recommended environment for this kind of work because they let you run cells individually, mix SQL and Python in the same notebook, and they come with Snowpark pre-installed. No `pip install`s, no virtual environments, no setup friction. You just open a notebook and start writing Python. The first thing you always do in a Snowflake Notebook is get your session. The session is your connection to Snowflake. Alternatively put, it's the object through which everything else happens. You do this by importing the `get_active_session` function and then assigning it to a variable like we see here. If we run this, we'll see how simple it is. One import, one line of variable assigning. Because we're inside a Snowflake Notebook, the session is already configured with all the right credentials and permissions. We don't have to pass a username, password, or account name. Snowflake handles that for us. Now, let's load some data. The most common way to create a Snowpark DataFrame is to point it at an existing Snowflake table using `session.table(...)`. At this point, no data has moved. No query has run against Snowflake. We've just created a reference to what data we want. This is what people mean when they say Snowpark DataFrames are lazily evaluated. The computation doesn't happen until you explicitly ask for it. The soup is still simmering. To actually see the data, we call `.show()`. If we execute this, we can see that we have our menu data from the same Tasty Bytes dataset we've been working with throughout the course. `.show()` is one of the action methods — a method that triggers actual execution. When we call `.show()`, Snowpark translates our DataFrame into SQL, sends it to Snowflake, and hands back the results. Another common action method is `.collect()`, which returns all the results as a Python list of `Row` objects rather than printing them. Let's call `.collect()` and see what it returns when we run the cell. Easy enough. Now we know how many rows there are and can see the first one. Quick recap: use `.show()` when you want to visually inspect data. Use `.collect()` when you want to bring results into Python and work with them programmatically. That's actions. Now, let's look at some transformations. The two most fundamental ones are `.filter()` for pulling specific rows and `.select()` for pulling specific columns, which mirrors the same concepts in SQL. To reference columns in Snowpark, we use the `col` function from the `snowflake.snowpark.functions` module. Ten rows, one per menu item for Freezing Point. Now, let's say we only want two of those columns. We can specify this by using the `col` function and naming the columns we want like this. One of the nicest things about the Snowpark DataFrame API is that you can chain these operations together on a single line — `.filter()` and `.select()` in one expression like this. Same result, one chain of transformations. Nothing runs in Snowflake until we call `.show()` at the end. There's also the `session.sql(...)` syntax, if you want to drop into raw SQL at any point and get a DataFrame back. Here, we'll do that by calling `session.sql(...)` and writing a `SELECT ... FROM ... WHERE ...` statement. Pretty familiar. Same output, different approach. Both are valid, so you should use whichever feels more natural for the task at hand. Let's take a quick look at one more useful entry point. `session.create_dataframe(...)` lets you create a DataFrame from local Python data, which is handy for testing or loading small reference datasets. That syntax looks like this. To recap Part I, we learned what Snowpark DataFrames are and why lazy evaluation matters. We set up a session in a Snowflake Notebook using `get_active_session()`. We loaded data with `session.table()`, triggered execution with `.show()` and `.collect()`, filtered rows with `.filter()`, selected columns with `.select()`, chained transformations together, and saw how `session.sql()` and `session.create_dataframe()` give you additional entry points. In Part II, we'll cover aggregations, writing data back to Snowflake, converting to pandas, and how to connect to Snowflake from your own local development environment. Thank you for watching.2. Let's practice!
Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.