Writing a Parquet snapshot
A downstream dashboard needs a slimmer Parquet snapshot of digital ebook activity. Build the result lazily, then write it back out with explicit compression and row group settings to tune the file for fast reads.
The LazyFrame requests is available, and the export path is in PARQUET_EXPORT_PATH.
Este exercicio faz parte do curso
Scaling and Optimizing Data Pipelines with Polars
Instruções do exercicio
- Keep only the first 500 digital rows for the snapshot.
- Set
compression_levelto5when writing the Parquet file. - Set
row_group_sizeto250rows.
exercicio interativo prático
Tente este exercicio completando este código de exemplo.
result = (
requests
.filter(pl.col("use") == "Digital")
.select("date", "format", "checkouts", "title")
# Keep only the first 500 rows
.____(500)
.collect()
)
result.write_parquet(
PARQUET_EXPORT_PATH,
# Set compression level to 5
compression_level=____,
# Set 250 rows per row group
row_group_size=____,
)
print(pl.read_parquet_schema(PARQUET_EXPORT_PATH))