Inizia subitoInizia gratis

Multiplexing Parquet sinks

The team now needs two extracts from the same cleaned query: one for digital checkouts and one for physical. Build both writes lazily so Polars can plan the shared scan once, then run them together in one pass.

clean_checkouts is preloaded, along with DIGITAL_EXPORT_PATH and PHYSICAL_EXPORT_PATH.

Questo esercizio fa parte del corso

Scaling and Optimizing Data Pipelines with Polars

Visualizza corso

Istruzioni dell'esercizio

  • Build both sinks lazily so they don't execute immediately.
  • Run both sinks together on the streaming engine.

esercizio interattivo pratico

Prova questo esercizio completando questo codice di esempio.

# Lazy sink for digital
digital_sink = (
    clean_checkouts
    .filter(pl.col("use") == "Digital")
    .sink_parquet(DIGITAL_EXPORT_PATH, lazy=____)
)

# Lazy sink for physical
physical_sink = (
    clean_checkouts
    .filter(pl.col("use") == "Physical")
    .sink_parquet(PHYSICAL_EXPORT_PATH, lazy=____)
)

# Run both sinks together
pl.____([digital_sink, physical_sink], engine="streaming")

# Check the row counts of each extract
result = pl.DataFrame(
    {
        "extract": ["digital", "physical"],
        "rows": [
            pl.scan_parquet(DIGITAL_EXPORT_PATH).select(pl.len()).collect().item(),
            pl.scan_parquet(PHYSICAL_EXPORT_PATH).select(pl.len()).collect().item(),
        ],
    }
)
print(result)
Modifica ed esegui il codice