BaşlayınÜcretsiz başlayın

Writing a Parquet snapshot

A downstream dashboard needs a slimmer Parquet snapshot of digital ebook activity. Build the result lazily, then write it back out with explicit compression and row group settings to tune the file for fast reads.

The LazyFrame requests is available, and the export path is in PARQUET_EXPORT_PATH.

Bu egzersiz, kursun bir parçasıdır

Scaling and Optimizing Data Pipelines with Polars

Kursa Göz Atın

Egzersiz talimatları

  • Keep only the first 500 digital rows for the snapshot.
  • Set compression_level to 5 when writing the Parquet file.
  • Set row_group_size to 250 rows.

Uygulamalı etkileşimli egzersiz

Bu egzersizi bu örnek kodu tamamlayarak deneyin.

result = (
    requests
    .filter(pl.col("use") == "Digital")
    .select("date", "format", "checkouts", "title")
    # Keep only the first 500 rows
    .____(500)
    .collect()
)

result.write_parquet(
    PARQUET_EXPORT_PATH,
    # Set compression level to 5
    compression_level=____,
    # Set 250 rows per row group
    row_group_size=____,
)

print(pl.read_parquet_schema(PARQUET_EXPORT_PATH))
Kodu Düzenle ve Çalıştır