Using a custom batch sink
For custom processing that Polars doesn't natively support, you can pass your own function. The team wants to see this pattern even though they'd normally use a built-in sink. Stream digital_rows through the pre-defined record_batch function.
digital_rows is preloaded. A record_batch(batch) function that records each batch's row count and checkout sum into batch_summaries is also defined for you.
Cet exercice fait partie du cours
<cours>Scaling and Optimizing Data Pipelines with Polars</cours>Instructions de l’exercice
- Stream
digital_rowsthrough the function in 5,000-row batches on the streaming engine.
Exercice interactif pratique
Essayez cet exercice en complétant ce code d’exemple.
# Stream batches through the record_batch function
digital_rows.____(
record_batch,
____=5_000,
____="streaming",
)
result = pl.DataFrame(batch_summaries)
print(result)