Get startedGet started for free

Creating a Categorical column

Back to the movie dataset from the streaming startup. Most rows in original_language use the same handful of codes (en, fr, es, …). Casting to Categorical encodes each unique label as an integer behind the scenes, saving memory on repeated strings.

This exercise is part of the course

Scaling and Optimizing Data Pipelines with Polars

View Course

Exercise instructions

  • Cast the original_language column to pl.Categorical.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

movies_cat = movies.with_columns(
    # Cast to Categorical to save memory
    pl.col("original_language").____(pl.____)
)

result = movies_cat.select("movie_title", "original_language").head(8)
print(result)
Edit and Run Code