Transforming data by group
1. Transforming data by group
Welcome back! In the previous lesson, we grouped and aggregated data. Now we'll take the next step: transforming each row with group-level statistics using window functions.2. Why compare to group statistics?
When analyzing time series data, it helps to compare each value to the typical values for that group. For example, are the prices on this day higher or lower than normal?3. Why compare to group statistics?
We can answer that by adding the average price for each hour.4. Why compare to group statistics?
Now we can see that midday prices are relatively cheap.5. Why compare to group statistics?
In contrast, evening prices are fairly normal. Next, let's calculate these hourly averages directly in Polars.6. Extracting the hour
To get the hourly mean, we first add an hour column using the dt.hour expression.7. Window function with .over()
To create our hourly average price, we start by taking the mean of the price column.8. Window function with .over()
Then we add the over expression to group by the hour column.9. Window function with .over()
Finally, we alias the result to create a new column.10. Window function with .over()
Each row now shows the average price for its hour across all days. The average midday prices are around £40, but on this sunny day they're close to zero - relatively cheap!11. Creating a weather category
For a more granular breakdown, we can compare prices at the same hour under similar weather. We'll classify an hour as sunny when solar exceeds 400 watts per square meter, and cloudy otherwise.12. Creating a weather category
We do this using when/then expressions and call the column weather. The output shows data from 10 AM on three successive days with different weather conditions.13. Grouping by multiple columns
Next, let's group by both hour and weather using the same pattern.14. Grouping by multiple columns
Then we pass both grouping columns to .over(). This calculates the mean price for each combination of hour AND weather.15. Grouping by multiple columns
And we alias the result to get a new column.16. Grouping by multiple columns
Now each row shows the average price for its specific hour-weather combination. Sunny 10 AM rows share one average, while cloudy 10 AM rows share a higher average due to reduced solar generation.17. Comparing to group average
A powerful use of window expressions is comparing each value to its group average. Here, we calculate how much each price differs18. Comparing to group average
from the average price for that hour.19. Comparing to group average
And we name this column appropriately. All three days show negative values, meaning prices were below the overall 10 AM average.20. Other window aggregations
The .over() method works with many aggregations. You can calculate group sums, maximums, counts, and more, while preserving the original rows.21. Let's practice!
Great work. You can now use window functions to compare each value to group-level statistics. Let's practice this in the exercises.Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.