Get startedGet started for free

Applying custom transformations

1. Applying custom transformations

In this lesson, we'll create custom functions when we need them, and then rewrite them as fast Polars expressions.

2. Venues dataset

We'll start with a small slice of our London venues dataset. We've received feedback that users prefer rating out of 10 rather than 5, so we need to rescale the review column.

3. Rescale reviews

We decide to do this by passing a Python function to Polars. First, we define a function called rescale_review that doubles the input value.

4. Rescale reviews

We create an expression on the review column.

5. Rescale reviews

And pass our function to map_elements. This applies our function to each element and doubles the review scores.

6. Specify return_dtype for control

Polars tries to infer what the output type will be. However, for explicit control, we can pass return_dtype equals Float64.

7. Polars Warning

When we run this function with map_elements, Polars raises a warning. Polars tells us that applying Python functions is slow, and we should use native expressions instead.

8. Rescale reviews (native expression)

Here, 2 * pl.col("review") is faster and cleaner than a map_elements call.

9. Choose the right tool

Our default approach is always native expressions, as they are fast and can be optimized by Polars. However, map_elements can be useful when speed is less important, and the logic is easier to follow in a pure Python function. map_elements is also essential if we need to call a function from a third-party Python package.

10. Standardize location text

Now, let's see how to make more complicated expressions reusable. We want to standardize the location column format by replacing abbreviations and converting them to uppercase.

11. Standardize locations

We start the expression with the location column inside with_columns.

12. Standardize locations

And chain replace_many to swap both abbreviations at once.

13. Standardize locations

We chain uppercase on the result and alias it as location_clean. Every location is now consistently formatted. However, we have other DataFrames where we want to apply the same chain of transformations to the location column.

14. Store an expression in a variable

To make the transformation reusable, we define the expression in a variable called standardize_locations_expr. This variable is a Polars expression type.

15. Reuse the expression

Now we can use the expression on any DataFrame with a location column. We can pass it directly into with_columns(). However, this only works if the column is called location. To work with other column names, we create our own custom expression with the core transformation logic.

16. Add a custom expression method

We first define a function called standardize. The input parameter represents whatever expression we pass in. Although we call the parameter input, it could be anything.

17. Add a custom expression method

Inside, we return input chained with the same replace_many and to_uppercase expressions.

18. Add a custom expression method

Then we attach our function to the Polars expression class. In this case, we set the expression to also be called standardize, but we could use a different name. This assignment lets us call our custom expression just like any built-in expression method.

19. Use the custom method

Now we apply our expression to the address column of a similar DataFrame called restaurants. And we get the standardized address_clean column in the output.

20. Let's practice!

Now it's your turn to apply custom functions and build reusable expressions!

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.