Parsing a messy CSV
A third-party ebook vendor exports Seattle digital checkouts as a semicolon-separated CSV with two metadata rows above the real header. Configure your scan to handle the layout so the team can preview a clean table.
polars is loaded as pl. The path to the vendor file is in MESSY_CSV_PATH.
Diese Übung ist Teil des Kurses
<Kurs>Scaling and Optimizing Data Pipelines with Polars</Kurs>Übungsanweisungen
- Skip the 2 metadata rows above the header.
- Tell Polars that the columns are separated by semicolons.
Interaktive praktische Übung
Versuche dich an dieser Übung, indem du diesen Beispielcode vervollständigst.
result = pl.scan_csv(
MESSY_CSV_PATH,
# Skip the 2 metadata rows above the header
skip_rows=____,
# Columns are separated by semicolons
separator="____",
).head(5).collect()
print(result)