LoslegenKostenlos starten

Overriding the inferred schema

When Polars scans the vendor export, it uses the first few rows to infer column types. The branch_code column uses 3-digit identifiers with leading zeros ("001", "002", …), but Polars sees integers and parses them as Int64, silently dropping the zeros. Override the schema so branch_code stays a string.

The inferred schema is already printed for you, so you can see what Polars guessed without any overrides.

Diese Übung ist Teil des Kurses

<Kurs>Scaling and Optimizing Data Pipelines with Polars</Kurs>
Kurs ansehen

Übungsanweisungen

  • Override branch_code so it's read as pl.String.

Interaktive praktische Übung

Versuche dich an dieser Übung, indem du diesen Beispielcode vervollständigst.

schema = pl.scan_csv(
    MESSY_CSV_PATH,
    separator=";",
    skip_rows=2,
    infer_schema_length=5,
    # Force branch_code to String
    schema_overrides={"____": pl.____},
).collect_schema()

print("\nOverridden schema:")
print(schema)
Code bearbeiten und ausführen