Get startedGet started for free

Multiplexing Parquet sinks

The team now needs two extracts from the same cleaned query: one for digital checkouts and one for physical. Build both writes lazily so Polars can plan the shared scan once, then run them together in one pass.

clean_checkouts is preloaded, along with DIGITAL_EXPORT_PATH and PHYSICAL_EXPORT_PATH.

This exercise is part of the course

Scaling and Optimizing Data Pipelines with Polars

View Course

Exercise instructions

  • Build both sinks lazily so they don't execute immediately.
  • Run both sinks together on the streaming engine.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Lazy sink for digital
digital_sink = (
    clean_checkouts
    .filter(pl.col("use") == "Digital")
    .sink_parquet(DIGITAL_EXPORT_PATH, lazy=____)
)

# Lazy sink for physical
physical_sink = (
    clean_checkouts
    .filter(pl.col("use") == "Physical")
    .sink_parquet(PHYSICAL_EXPORT_PATH, lazy=____)
)

# Run both sinks together
pl.____([digital_sink, physical_sink], engine="streaming")

# Check the row counts of each extract
result = pl.DataFrame(
    {
        "extract": ["digital", "physical"],
        "rows": [
            pl.scan_parquet(DIGITAL_EXPORT_PATH).select(pl.len()).collect().item(),
            pl.scan_parquet(PHYSICAL_EXPORT_PATH).select(pl.len()).collect().item(),
        ],
    }
)
print(result)
Edit and Run Code