Get startedGet started for free

Filtering rows

1. Filtering rows

Now we'll learn how to filter a DataFrame.

2. Why filter a DataFrame?

With our rentals dataset, potential clients will want to focus on properties that meet their criteria - such as beachfront properties, or budget-friendly options. Filtering lets us extract the properties they're interested in.

3. Introducing filter

To filter a DataFrame, we use the filter method, which takes a predicate as its argument. A predicate means evaluating a condition to be either True or False. We can think of the predicate as a test that each row either passes

4. Introducing filter

or fails.

5. Adding a predicate

Let's create a budget-friendly predicate that keeps properties priced under 500 using pl.col() and a comparison operator. We have 6 properties that match.

6. Combining conditions with AND

However, finding the ideal rental often requires meeting multiple criteria. For example, we may need properties priced under 500 that are by the beach.

7. Combining conditions with AND

We start with our pricing predicate,

8. Combining conditions with AND

and wrap this in parentheses to combine it with another predicate.

9. Combining conditions with AND

Then we use the AND operator, represented by the ampersand symbol (&) to tell Polars that all conditions must be true.

10. Combining conditions with AND

Finally, we add our second predicate for beachfront properties. The output shows six properties that meet both conditions.

11. Combining conditions with OR

Sometimes clients are flexible and will accept either budget-friendly properties OR highly-rated ones, regardless of price.

12. Combining conditions with OR

We start again with our pricing predicate.

13. Combining conditions with OR

Now we introduce the OR operator, written as the pipe symbol (|), which tells Polars to include properties where either predicate is true.

14. Combining conditions with OR

Finally, we add the second predicate: properties with a review score greater than 9.5. The full predicate now reads: "Show me properties that EITHER cost less than 500 OR have a review score above 9.5." We get 18 such properties.

15. Filtering based on a list

When we have specific preferences, we can use the .is_in() method to check if a value is in our list of acceptable options. Here we filter for properties that are either Cottages or Villas.

16. Negating a predicate

We can negate a predicate, that is, reversing the condition, by adding the not-underscore expression. Here we now see properties that are NOT Cottages or Villas.

17. Query optimizations

In a lazy query Polars can apply query optimizations with a filter. Here we scan the CSV and then filter for Villas.

18. Query optimizations

Calling explain shows the optimized query plan. The output shows a row called SELECTION that sets out the optimization for Polars to only read the Villa properties into the DataFrame, improving speed and reducing memory use.

19. Filter conditions

We've seen that we can create predicates by using comparison operators or using .is_in with a list. Polars has many other boolean expressions such as .in_between. You can learn more about these in the linked documentation.

20. Using standard Python comparison operators

In a Polars predicate, we can combine expressions with all the standard Python comparison operators.

21. Let's practice!

Let's practice.