Applying custom transformations
1. Applying custom transformations
In this lesson, we'll create custom functions when we need them, and then rewrite them as fast Polars expressions.2. Venues dataset
We'll start with a small slice of our London venues dataset. We've received feedback that users prefer rating out of 10 rather than 5, so we need to rescale the review column.3. Rescale reviews
We decide to do this by passing a Python function to Polars. First, we define a function called rescale_review that doubles the input value.4. Rescale reviews
We create an expression on the review column.5. Rescale reviews
And pass our function to map_elements. This applies our function to each element and doubles the review scores.6. Specify return_dtype for control
Polars tries to infer what the output type will be. However, for explicit control, we can pass return_dtype equals Float64.7. Polars Warning
When we run this function with map_elements, Polars raises a warning. Polars tells us that applying Python functions is slow, and we should use native expressions instead.8. Rescale reviews (native expression)
Here, 2 * pl.col("review") is faster and cleaner than a map_elements call.9. Choose the right tool
Our default approach is always native expressions, as they are fast and can be optimized by Polars. However, map_elements can be useful when speed is less important, and the logic is easier to follow in a pure Python function. map_elements is also essential if we need to call a function from a third-party Python package.10. Standardize location text
Now, let's see how to make more complicated expressions reusable. We want to standardize the location column format by replacing abbreviations and converting them to uppercase.11. Standardize locations
We start the expression with the location column inside with_columns.12. Standardize locations
And chain replace_many to swap both abbreviations at once.13. Standardize locations
We chain uppercase on the result and alias it as location_clean. Every location is now consistently formatted. However, we have other DataFrames where we want to apply the same chain of transformations to the location column.14. Store an expression in a variable
To make the transformation reusable, we define the expression in a variable called standardize_locations_expr. This variable is a Polars expression type.15. Reuse the expression
Now we can use the expression on any DataFrame with a location column. We can pass it directly into with_columns(). However, this only works if the column is called location. To work with other column names, we create our own custom expression with the core transformation logic.16. Add a custom expression method
We first define a function called standardize. The input parameter represents whatever expression we pass in. Although we call the parameter input, it could be anything.17. Add a custom expression method
Inside, we return input chained with the same replace_many and to_uppercase expressions.18. Add a custom expression method
Then we attach our function to the Polars expression class. In this case, we set the expression to also be called standardize, but we could use a different name. This assignment lets us call our custom expression just like any built-in expression method.19. Use the custom method
Now we apply our expression to the address column of a similar DataFrame called restaurants. And we get the standardized address_clean column in the output.20. Let's practice!
Now it's your turn to apply custom functions and build reusable expressions!Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.