CommencerCommencez gratuitement

Overriding the inferred schema

When Polars scans the vendor export, it uses the first few rows to infer column types. The branch_code column uses 3-digit identifiers with leading zeros ("001", "002", …), but Polars sees integers and parses them as Int64, silently dropping the zeros. Override the schema so branch_code stays a string.

The inferred schema is already printed for you, so you can see what Polars guessed without any overrides.

Cet exercice fait partie du cours

<cours>Scaling and Optimizing Data Pipelines with Polars</cours>
Voir le cours

Instructions de l’exercice

  • Override branch_code so it's read as pl.String.

Exercice interactif pratique

Essayez cet exercice en complétant ce code d’exemple.

schema = pl.scan_csv(
    MESSY_CSV_PATH,
    separator=";",
    skip_rows=2,
    infer_schema_length=5,
    # Force branch_code to String
    schema_overrides={"____": pl.____},
).collect_schema()

print("\nOverridden schema:")
print(schema)
Modifier et exécuter le code