Scanning multiple files
The team's checkout data is now split across one CSV per year (seattle_2021.csv, seattle_2022.csv, seattle_2023.csv). These yearly files use the legacy column names usageclass and materialtype. Use a glob pattern to scan all files together as one logical dataset, then build a physical-checkouts summary.
polars is loaded as pl, and the directory is in MULTIFILE_DIR.
Este exercicio faz parte do curso
Scaling and Optimizing Data Pipelines with Polars
Instruções do exercicio
- Scan every
seattle_*.csvfile inMULTIFILE_DIRusing a glob pattern. - Filter the combined dataset to
"Physical"checkouts, then group bymaterialtype.
exercicio interativo prático
Tente este exercicio completando este código de exemplo.
# Scan every yearly file using a glob pattern
yearly_checkouts = pl.____(
str(MULTIFILE_DIR / "____")
)
# Build a physical-checkout summary across the combined dataset
result = (
yearly_checkouts
# Filter to physical
.filter(pl.col("usageclass") == "____")
.group_by("____")
.agg(pl.col("checkouts").sum().alias("total"))
.sort("total", descending=True)
.collect()
)
print(result)