1. Filtering rows
Now we'll learn
how to filter a DataFrame.
2. Why filter a DataFrame?
With our rentals dataset,
potential clients will want to focus on properties
that meet their criteria - such as beachfront properties, or budget-friendly options. Filtering lets us extract the properties they're interested in.
3. Introducing filter
To filter a DataFrame,
we use the filter method, which takes a predicate as its argument. A predicate means evaluating a condition
to be either True or False. We can think of the predicate as a test that each row either
passes
4. Introducing filter
or fails.
5. Adding a predicate
Let's create a budget-friendly predicate
that keeps properties priced under 500 using pl.col() and a comparison operator.
We have 6 properties that match.
6. Combining conditions with AND
However, finding the ideal rental often requires meeting multiple criteria. For example,
we may need properties priced under 500 that are by the beach.
7. Combining conditions with AND
We start with our pricing predicate,
8. Combining conditions with AND
and wrap this in parentheses to combine it with another predicate.
9. Combining conditions with AND
Then we use the AND operator, represented by the ampersand symbol (&) to tell Polars that all conditions must be true.
10. Combining conditions with AND
Finally, we add our second predicate for beachfront properties.
The output shows six properties that meet both conditions.
11. Combining conditions with OR
Sometimes clients are flexible and will accept either budget-friendly properties OR highly-rated ones, regardless of price.
12. Combining conditions with OR
We start again with our pricing predicate.
13. Combining conditions with OR
Now we introduce the OR operator, written as the pipe symbol (|), which tells Polars to include properties where either predicate is true.
14. Combining conditions with OR
Finally, we add the second predicate: properties with a review score greater than 9.5. The full predicate now reads: "Show me properties that EITHER cost less than 500 OR have a review score above 9.5."
We get 18 such properties.
15. Filtering based on a list
When we have specific preferences,
we can use the .is_in() method to check if a value is in our list of acceptable options. Here we filter for
properties that are either Cottages or Villas.
16. Negating a predicate
We can negate a predicate, that is, reversing the condition,
by adding the not-underscore expression.
Here we now see properties that are NOT Cottages or Villas.
17. Query optimizations
In a lazy query Polars can apply query optimizations with a filter.
Here we scan the CSV and then filter for Villas.
18. Query optimizations
Calling explain shows the optimized query plan.
The output shows a row called SELECTION that sets out the optimization for Polars to only read the Villa properties into the DataFrame, improving speed and reducing memory use.
19. Filter conditions
We've seen that we can create predicates by
using comparison operators or
using .is_in with a list. Polars has many other boolean expressions
such as .in_between. You can learn more about these in the linked documentation.
20. Using standard Python comparison operators
In a Polars predicate, we can combine expressions with all the standard Python comparison operators.
21. Let's practice!
Let's practice.