Overriding the inferred schema
When Polars scans the vendor export, it uses the first few rows to infer column types. The branch_code column uses 3-digit identifiers with leading zeros ("001", "002", …), but Polars sees integers and parses them as Int64, silently dropping the zeros. Override the schema so branch_code stays a string.
The inferred schema is already printed for you, so you can see what Polars guessed without any overrides.
This exercise is part of the course
Scaling and Optimizing Data Pipelines with Polars
Exercise instructions
- Override
branch_codeso it's read aspl.String.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
schema = pl.scan_csv(
MESSY_CSV_PATH,
separator=";",
skip_rows=2,
infer_schema_length=5,
# Force branch_code to String
schema_overrides={"____": pl.____},
).collect_schema()
print("\nOverridden schema:")
print(schema)