Multiplexing Parquet sinks

The team now needs two extracts from the same cleaned query: one for digital checkouts and one for physical. Build both writes lazily so Polars can plan the shared scan once, then run them together in one pass.

clean_checkouts is preloaded, along with DIGITAL_EXPORT_PATH and PHYSICAL_EXPORT_PATH.

This exercise is part of the course

Scaling and Optimizing Data Pipelines with Polars

View Course

Exercise instructions

Build both sinks lazily so they don't execute immediately.
Run both sinks together on the streaming engine.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Lazy sink for digital
digital_sink = (
    clean_checkouts
    .filter(pl.col("use") == "Digital")
    .sink_parquet(DIGITAL_EXPORT_PATH, lazy=____)
)

# Lazy sink for physical
physical_sink = (
    clean_checkouts
    .filter(pl.col("use") == "Physical")
    .sink_parquet(PHYSICAL_EXPORT_PATH, lazy=____)
)

# Run both sinks together
pl.____([digital_sink, physical_sink], engine="streaming")

# Check the row counts of each extract
result = pl.DataFrame(
    {
        "extract": ["digital", "physical"],
        "rows": [
            pl.scan_parquet(DIGITAL_EXPORT_PATH).select(pl.len()).collect().item(),
            pl.scan_parquet(PHYSICAL_EXPORT_PATH).select(pl.len()).collect().item(),
        ],
    }
)
print(result)

Edit and Run Code