Overriding the inferred schema
When Polars scans the vendor export, it uses the first few rows to infer column types. The branch_code column uses 3-digit identifiers with leading zeros ("001", "002", …), but Polars sees integers and parses them as Int64, silently dropping the zeros. Override the schema so branch_code stays a string.
The inferred schema is already printed for you, so you can see what Polars guessed without any overrides.
Diese Übung ist Teil des Kurses
<Kurs>Scaling and Optimizing Data Pipelines with Polars</Kurs>Übungsanweisungen
- Override
branch_codeso it's read aspl.String.
Interaktive praktische Übung
Versuche dich an dieser Übung, indem du diesen Beispielcode vervollständigst.
schema = pl.scan_csv(
MESSY_CSV_PATH,
separator=";",
skip_rows=2,
infer_schema_length=5,
# Force branch_code to String
schema_overrides={"____": pl.____},
).collect_schema()
print("\nOverridden schema:")
print(schema)