Creating an Object dtype
For an ad-hoc text exploration, the team wants each movie's title as a Python set of lowercase words. A set isn't a native Polars dtype, so store it in a column of dtype pl.Object.
The DataFrame movies is available, and the helper title_to_word_set(title) is already defined and returns a Python set of lowercase words from a title.
This exercise is part of the course
Scaling and Optimizing Data Pipelines with Polars
Exercise instructions
- Run
title_to_word_seton eachmovie_titleand store the returned Python sets in a new column. - Alias the new column as
title_word_set.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
result = movies.with_columns(
pl.col("movie_title")
# Convert each title to a set of words
.____(title_to_word_set, return_dtype=pl.____)
.alias("____")
).select("movie_title", "title_word_set").head(8)
print(result)