CommencerCommencez gratuitement

Creating a Categorical column

Back to the movie dataset from the streaming startup. Most rows in original_language use the same handful of codes (en, fr, es, …). Casting to Categorical encodes each unique label as an integer behind the scenes, saving memory on repeated strings.

Cet exercice fait partie du cours

<cours>Scaling and Optimizing Data Pipelines with Polars</cours>
Voir le cours

Instructions de l’exercice

  • Cast the original_language column to pl.Categorical.

Exercice interactif pratique

Essayez cet exercice en complétant ce code d’exemple.

movies_cat = movies.with_columns(
    # Cast to Categorical to save memory
    pl.col("original_language").____(pl.____)
)

result = movies_cat.select("movie_title", "original_language").head(8)
print(result)
Modifier et exécuter le code