Summarizing Parquet data
The first Parquet-based report is the digital checkout summary that the team built in Chapter 1, but now starting from a scan_parquet query. Build the same lazy pipeline so the team can reuse this pattern across their archive.
The LazyFrame requests is already built for you from the Parquet file.
Deze oefening maakt deel uit van de cursus
Scaling and Optimizing Data Pipelines with Polars
Oefeninstructies
- Filter
requeststo rows whereuseis"Digital". - Group the filtered rows by
format. - Trigger execution at the very end of the pipeline.
Interactieve oefening met praktijkervaring
Probeer deze oefening door deze voorbeeldcode aan te vullen.
result = (
requests
# Filter to digital
.filter(pl.col("use") == "____")
# Group by format
.group_by("____")
.agg(pl.col("checkouts").sum().alias("total"))
.sort("total", descending=True)
# Trigger execution at the end
.____()
)
print(result)