BaşlayınÜcretsiz başlayın

Processing query results in batches

The team wants to push digital checkout rows to a downstream system in chunks rather than collect one huge DataFrame. Iterate through 5,000-row batches and capture a small summary of each.

A LazyFrame digital_rows with the filtered digital checkouts is preloaded.

Bu egzersiz, kursun bir parçasıdır

Scaling and Optimizing Data Pipelines with Polars

Kursa Göz Atın

Egzersiz talimatları

  • Iterate through digital_rows in batches of 5,000 rows on the streaming engine.

Uygulamalı etkileşimli egzersiz

Bu egzersizi bu örnek kodu tamamlayarak deneyin.

batch_summaries = []

# Stream digital_rows in chunks of 5,000
for batch_no, batch in enumerate(
    digital_rows.____(chunk_size=____, engine="streaming"),
    start=1,
):
    batch_summaries.append(
        {
            "batch": batch_no,
            "rows": batch.height,
            "checkouts": batch["checkouts"].sum(),
        }
    )

result = pl.DataFrame(batch_summaries)
print(result)
Kodu Düzenle ve Çalıştır