Get startedGet started for free

Transforming columns

1. Transforming columns

Now let's learn how to transform our data using expressions

2. Transforming columns

Real-world data is rarely ready for analysis out of the box. We need to transform it by cleaning values, applying business logic and doing some analysis. Expressions in Polars offer a powerful way to achieve this. We'll prepare our rental property dataset for a marketing campaign that offers discounts on available listings.

3. Creating expressions: pl.col()

Before we can transform a column we first need to create an expression from the column with pl.col() inside the select method. Here we create our first expression using pl.col("price"). With no transformations, we just get the price column back.

4. Arithmetic with expressions

Now we add arithmetic and multiply pl.col("price") by 0.8 to transform the price into the 20% discounted price.

5. Chaining transformations

To make discounted prices more customer-friendly, we'll round them to the nearest integer. We do this by adding parentheses around the multiplication and then adding the round expression. Using a sequence of expressions like this is called chaining.

6. Renaming an expression

The transformed column is still named price, but we want to rename it to discounted_price. To rename the output column we add the .alias expression to the end of the chain.

7. Creating expressions from a constant

Beyond transforming existing columns, we sometimes need to add new information to our dataset. The pl.lit() expression lets us create an expression from a constant value. Here, we add a boolean column called "available" to indicate the available properties. In this case we are using multiple expressions.

8. Parallel vs. serial processing

One powerful feature of Polars is that when we use multiple expressions, Polars runs them in parallel. Running in parallel means each task runs at the same time. This is typically faster than serial processing where only one task is run at a time.

9. Aggregating a column

We can also aggregate data with expressions in Polars. Let's say we are asked to do an analysis where we want to compare the price of each property with the average and maximum price. We do this with the mean and max expressions. We rename these expressions with alias so they are added as new columns.

10. Built-in transformations

Polars has many built-in expressions beyond aggregations. For example we can use the rank expression to rank the properties in terms of their price. The first two properties rank equally, as they have the same price.

11. Dtype-specific expressions

Polars has some expressions that are dtype-specific. For example, we might want to convert the strings in the type column to lowercase to ensure we don't have variations with upper and lowercase letters. We convert the type column to lowercase with the .str.to_lowercase expression. Expressions that only apply to one dtype are grouped together in a namespace such as string expressions in the .str namespace. This gives us all lowercase output.

12. Let's practice!

Time for some transformations!