CommencerCommencez gratuitement

Scanning multiple files

The team's checkout data is now split across one CSV per year (seattle_2021.csv, seattle_2022.csv, seattle_2023.csv). These yearly files use the legacy column names usageclass and materialtype. Use a glob pattern to scan all files together as one logical dataset, then build a physical-checkouts summary.

polars is loaded as pl, and the directory is in MULTIFILE_DIR.

Cet exercice fait partie du cours

<cours>Scaling and Optimizing Data Pipelines with Polars</cours>
Voir le cours

Instructions de l’exercice

  • Scan every seattle_*.csv file in MULTIFILE_DIR using a glob pattern.
  • Filter the combined dataset to "Physical" checkouts, then group by materialtype.

Exercice interactif pratique

Essayez cet exercice en complétant ce code d’exemple.

# Scan every yearly file using a glob pattern
yearly_checkouts = pl.____(
    str(MULTIFILE_DIR / "____")
)

# Build a physical-checkout summary across the combined dataset
result = (
    yearly_checkouts
    # Filter to physical
    .filter(pl.col("usageclass") == "____")
    .group_by("____")
    .agg(pl.col("checkouts").sum().alias("total"))
    .sort("total", descending=True)
    .collect()
)
print(result)
Modifier et exécuter le code