1. Learn
  2. /
  3. Courses
  4. /
  5. Scaling and Optimizing Data Pipelines with Polars

Connected

Exercise

Scanning multiple files

The team's checkout data is now split across one CSV per year (seattle_2021.csv, seattle_2022.csv, seattle_2023.csv). These yearly files use the legacy column names usageclass and materialtype. Use a glob pattern to scan all files together as one logical dataset, then build a physical-checkouts summary.

polars is loaded as pl, and the directory is in MULTIFILE_DIR.

Instructions

100 XP
  • Scan every seattle_*.csv file in MULTIFILE_DIR using a glob pattern.
  • Filter the combined dataset to "Physical" checkouts, then group by materialtype.