Inizia subitoInizia gratis

Summarizing Parquet data

The first Parquet-based report is the digital checkout summary that the team built in Chapter 1, but now starting from a scan_parquet query. Build the same lazy pipeline so the team can reuse this pattern across their archive.

The LazyFrame requests is already built for you from the Parquet file.

Questo esercizio fa parte del corso

Scaling and Optimizing Data Pipelines with Polars

Visualizza corso

Istruzioni dell'esercizio

  • Filter requests to rows where use is "Digital".
  • Group the filtered rows by format.
  • Trigger execution at the very end of the pipeline.

esercizio interattivo pratico

Prova questo esercizio completando questo codice di esempio.

result = (
    requests
    # Filter to digital
    .filter(pl.col("use") == "____")
    # Group by format
    .group_by("____")
    .agg(pl.col("checkouts").sum().alias("total"))
    .sort("total", descending=True)
    # Trigger execution at the end
    .____()
)
print(result)
Modifica ed esegui il codice