Filtering rows
1. Filtering rows
Now we'll learn how to filter a DataFrame.2. Why filter a DataFrame?
With our rentals dataset, potential clients will want to focus on properties that meet their criteria - such as beachfront properties, or budget-friendly options. Filtering lets us extract the properties they're interested in.3. Introducing filter
To filter a DataFrame, we use the filter method, which takes a predicate as its argument. A predicate means evaluating a condition to be either True or False. We can think of the predicate as a test that each row either passes4. Introducing filter
or fails.5. Adding a predicate
Let's create a budget-friendly predicate that keeps properties priced under 500 using pl.col() and a comparison operator. We have 6 properties that match.6. Combining conditions with AND
However, finding the ideal rental often requires meeting multiple criteria. For example, we may need properties priced under 500 that are by the beach.7. Combining conditions with AND
We start with our pricing predicate,8. Combining conditions with AND
and wrap this in parentheses to combine it with another predicate.9. Combining conditions with AND
Then we use the AND operator, represented by the ampersand symbol (&) to tell Polars that all conditions must be true.10. Combining conditions with AND
Finally, we add our second predicate for beachfront properties. The output shows six properties that meet both conditions.11. Combining conditions with OR
Sometimes clients are flexible and will accept either budget-friendly properties OR highly-rated ones, regardless of price.12. Combining conditions with OR
We start again with our pricing predicate.13. Combining conditions with OR
Now we introduce the OR operator, written as the pipe symbol (|), which tells Polars to include properties where either predicate is true.14. Combining conditions with OR
Finally, we add the second predicate: properties with a review score greater than 9.5. The full predicate now reads: "Show me properties that EITHER cost less than 500 OR have a review score above 9.5." We get 18 such properties.15. Filtering based on a list
When we have specific preferences, we can use the .is_in() method to check if a value is in our list of acceptable options. Here we filter for properties that are either Cottages or Villas.16. Negating a predicate
We can negate a predicate, that is, reversing the condition, by adding the not-underscore expression. Here we now see properties that are NOT Cottages or Villas.17. Query optimizations
In a lazy query Polars can apply query optimizations with a filter. Here we scan the CSV and then filter for Villas.18. Query optimizations
Calling explain shows the optimized query plan. The output shows a row called SELECTION that sets out the optimization for Polars to only read the Villa properties into the DataFrame, improving speed and reducing memory use.19. Filter conditions
We've seen that we can create predicates by using comparison operators or using .is_in with a list. Polars has many other boolean expressions such as .in_between. You can learn more about these in the linked documentation.20. Using standard Python comparison operators
In a Polars predicate, we can combine expressions with all the standard Python comparison operators.21. Let's practice!
Let's practice.Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.