Working with multiple columns

1. Working with multiple columns

Next, we'll see how Polars makes it easier to prepare data by working with multiple columns at once.

2. Using pl.col()

When designing property brochures, our marketing team needs to know the length of both property names and types to ensure they fit in their templates.

3. Using pl.col()

We pass both name and type to pl.col and then continue with the .str.len_chars expression to count the letters. The name and type columns now show the number of characters these columns had in the original DataFrame.

4. Using pl.col() with dtypes

If we want to work with all columns with the same dtype we can pass a Polars dtype to pl.col. Here we pass pl.String to pl.col and continue as before with .str.len_chars. This gives the same output as before.

5. Using pl.col() with dtypes

The other dtypes that we could use to select columns in this DataFrame are pl.Int64, pl.Float64, and pl.Boolean.

6. Introducing selectors

Polars also has a set of functions called selectors for creating expressions from multiple columns. Selectors allow you to select all columns with similar dtypes. Here, we use pl.selectors.string to select all of the string columns

7. Name matching with selectors

Now we need to report on the details of each property. We notice that all of the bedroom columns end with s. Selectors also have functions to get all columns that have a similar name pattern. Here we use selectors.ends_with to get all columns that end with s

8. Combining selectors

However, we also need the string name and type columns for our report. We can combine selectors for more control over which columns are included. Here we select the string columns together with the columns that end with s using the pipe operator to get the full set of columns we need for our report.

9. Selectors overview

There are different selectors for different dtypes. And selectors for different column name patterns such as the start or end letter. There is an excellent guide to selectors in the Polars docs at the link shown.

10. Adding a suffix to a column name

In our report we also need to display the maximum and minimum price and review score for our portfolio. It is straightforward to do this for a single aggregation on a column as we can keep the original column name. But if we need multiple aggregations we need the outputs to have distinct names.

11. Adding a suffix to a column name

We can ensure our aggregations have distinct names by ending each expression with name.suffix. Here we end the .min suffix with underscore min and the .max suffix with underscore max to ensure that we have distinct column names for our report.

12. Excluding a column

Sometimes it's simpler to specify which columns to exclude from an operation. To format our report correctly, we need all non-boolean columns to be strings. We use pl.exclude(beach) to do this.

13. Let's practice!

Now it's time to practice with multiple columns.

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.