Get startedGet started for free

Introducing lazy mode

1. Introducing lazy mode

Now we meet one of Polars' most powerful features: lazy mode.

2. Eager mode vs. lazy mode

Polars offers two modes to work with data - eager and lazy. Let's say our colleague asks for the name and price of every property in our rentals dataset. We start in eager mode with pl.read_csv, which loads all the rentals data into a DataFrame.

3. Eager mode vs. lazy mode

Alternatively we can start a lazy query with pl.scan_csv. When we run pl.scan_csv Polars first starts a query plan that sets out what we want to do. At this stage the plan is just to load the CSV. Secondly Polars checks the first rows of the CSV to get the schema - that is the column names and dtypes.

4. Eager mode vs. lazy mode

So if we run the eager query, we get the full DataFrame. But, if we run the lazy query we get a query plan. There'll be more on this later.

5. Eager mode vs. lazy mode

As our colleague only wants names and prices we need to select name and price. In eager mode Polars reads the full CSV into a DataFrame in the first line and then drops all columns apart from name and price in the second line.

6. Eager mode vs. lazy mode

Adding this select step in lazy mode updates the query plan. Polars optimizes the query plan and limits the amount of data it will load into a DataFrame to only the selected columns.

7. Optimized query plan

Ending a lazy query with .explain(),prints the optimized query plan. The first line of the optimized plan is the scan of the CSV file. The second line says Project 2 out of 8 columns, meaning only 2 out of 8 columns should be loaded into a DataFrame. "Project" refers to projection pushdown, the technical name for limiting the columns.

8. Executing a lazy query

Now we turn our lazy query into a DataFrame to share with our colleague. We call .collect() at the end of the lazy query, which tells Polars to execute the optimized query plan and return a DataFrame. The optimized query result matches the eager query, but it's faster and uses less memory.

9. Eager mode vs. lazy mode

The key differences between these modes is that in eager mode, Polars executes code one line at a time, whereas in lazy mode, Polars finds the optimized way to execute the full set of operations.

10. Eager mode vs. lazy mode

So if lazy mode is optimized, when should we use eager mode? Eager mode is best for seeing what happens step-by-step - as we do in this course. For similar reasons, eager mode is useful for debugging. We use lazy mode when we want to optimize the performance of a script for speed. As our rental properties dataset grows, we would start to see much faster performance with lazy mode.

11. Let's practice!

Now it's time to create your own lazy queries.