Transforming Data with Expressions
1. Cleaning text data
Hi, I'm Liam. I'm an experienced data scientist2. Meet your instructor
and Polars contributor. I'll be your guide to transforming data with Polars.3. Transformation Engine
Polars is a powerful engine that takes your tabular data and transforms it in parallel4. Is this course for you?
This course requires familiarity with creating a Polars DataFrame, using a Polars expression and doing group-by aggregations. If you are not familiar with these, then I recommend doing the Introduction to Polars course first5. Chapter 1
In this course, we'll learn how to transform data with Polars. We'll start by working with text data and creating conditional expressions.6. Chapter 2
Ever struggled with timestamps? Chapter two dives into time series data. We'll learn window expressions - powerful tools for running totals and moving averages.7. Chapter 3
Real-world analysis rarely lives in a single table. Chapter three shows you how to combine DataFrames - joining and merging data from multiple sources.8. Chapter 4
Finally, we put it all together, building custom transformation pipelines and exploring advanced analytics, such as correlation. Let's dive in.9. Meet our dataset
We start by importing polars as pl and reading our CSV with restaurant hygiene inspections in London. The dataset includes the name, location, restaurant type, hygiene rating, and capacity.10. Restaurant recommendation app
Our goal is to build a restaurant recommendation app for London. But to recommend clean restaurants, we need clean data first. Notice these issues: some business names have leading whitespace - this causes duplicates when filtering. The rating and capacity are floats, but should be integers. And see how Costa Coffee appears twice? Without a unique identifier combining name and location, we can't tell them apart. Let's fix these one by one.11. Casting dtype with an expression
Let's start with those float columns. To transform a column in place, we use .with_columns().12. Casting dtype with an expression
We create an expression on the rating column using pl.col, then chain .cast() with our target dtype of pl.Int64. Now the rating is stored as an integer.13. Casting multiple columns
We need to cast multiple columns to an integer. While we can work with multiple columns using .with_columns(), a simpler approach is to use the .cast() method on a DataFrame.14. Casting multiple columns
Inside .cast(), we pass a Python dictionary15. Casting multiple columns
We then specify that we want to transform all Float64 columns to Int64 columns and confirm this has worked16. Cleaning text data
With the dtypes fixed, let's tackle that whitespace issue. Some of the business names have whitespace at the start - we need to remove this so we can identify similar properties.17. Cleaning text data
Polars has many expressions for working with text data in the .str namespace18. Cleaning text data
You can see the full set here in the official docs at this link. For our purposes, we need19. Cleaning text data
the strip_chars_start expression to remove leading whitespace20. Cleaning text data
We call .with_columns() to transform an existing column,21. Cleaning text data
create an expression on the business column22. Cleaning text data
and apply the strip_chars_start expression to remove leading whitespace. Now we see that the names are consistently formatted.23. Combining text data
Now for that identifier column. The dataset has businesses with the same name in different places, like Costa Coffee here.24. Combining text data
We'd like to add a column that combines name and location to identify individual premises25. Combining text data
We use pl.concat_str to combine strings from different columns26. Combining text data
We pass the column names to combine - business and location in this case - separated by commas.27. Combining text data
Then we provide a separator to split the strings28. Combining text data
And we name the output column as id with the .alias() expression. This gives us our new identifier column.29. Let's practice!
That was our introduction to transformations in Polars. Now, let's practice!Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.