Get startedGet started for free

Transforming data by group

1. Transforming data by group

Welcome back! In the previous lesson, we grouped and aggregated data. Now we'll take the next step: transforming each row with group-level statistics using window functions.

2. Why compare to group statistics?

When analyzing time series data, it helps to compare each value to the typical values for that group. For example, are the prices on this day higher or lower than normal?

3. Why compare to group statistics?

We can answer that by adding the average price for each hour.

4. Why compare to group statistics?

Now we can see that midday prices are relatively cheap.

5. Why compare to group statistics?

In contrast, evening prices are fairly normal. Next, let's calculate these hourly averages directly in Polars.

6. Extracting the hour

To get the hourly mean, we first add an hour column using the dt.hour expression.

7. Window function with .over()

To create our hourly average price, we start by taking the mean of the price column.

8. Window function with .over()

Then we add the over expression to group by the hour column.

9. Window function with .over()

Finally, we alias the result to create a new column.

10. Window function with .over()

Each row now shows the average price for its hour across all days. The average midday prices are around £40, but on this sunny day they're close to zero - relatively cheap!

11. Creating a weather category

For a more granular breakdown, we can compare prices at the same hour under similar weather. We'll classify an hour as sunny when solar exceeds 400 watts per square meter, and cloudy otherwise.

12. Creating a weather category

We do this using when/then expressions and call the column weather. The output shows data from 10 AM on three successive days with different weather conditions.

13. Grouping by multiple columns

Next, let's group by both hour and weather using the same pattern.

14. Grouping by multiple columns

Then we pass both grouping columns to .over(). This calculates the mean price for each combination of hour AND weather.

15. Grouping by multiple columns

And we alias the result to get a new column.

16. Grouping by multiple columns

Now each row shows the average price for its specific hour-weather combination. Sunny 10 AM rows share one average, while cloudy 10 AM rows share a higher average due to reduced solar generation.

17. Comparing to group average

A powerful use of window expressions is comparing each value to its group average. Here, we calculate how much each price differs

18. Comparing to group average

from the average price for that hour.

19. Comparing to group average

And we name this column appropriately. All three days show negative values, meaning prices were below the overall 10 AM average.

20. Other window aggregations

The .over() method works with many aggregations. You can calculate group sums, maximums, counts, and more, while preserving the original rows.

21. Let's practice!

Great work. You can now use window functions to compare each value to group-level statistics. Let's practice this in the exercises.

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.