Processing query results in batches
The team wants to push digital checkout rows to a downstream system in chunks rather than collect one huge DataFrame. Iterate through 5,000-row batches and capture a small summary of each.
A LazyFrame digital_rows with the filtered digital checkouts is preloaded.
Deze oefening maakt deel uit van de cursus
Scaling and Optimizing Data Pipelines with Polars
Oefeninstructies
- Iterate through
digital_rowsin batches of 5,000 rows on the streaming engine.
Interactieve oefening met praktijkervaring
Probeer deze oefening door deze voorbeeldcode aan te vullen.
batch_summaries = []
# Stream digital_rows in chunks of 5,000
for batch_no, batch in enumerate(
digital_rows.____(chunk_size=____, engine="streaming"),
start=1,
):
batch_summaries.append(
{
"batch": batch_no,
"rows": batch.height,
"checkouts": batch["checkouts"].sum(),
}
)
result = pl.DataFrame(batch_summaries)
print(result)