Get startedGet started for free

Profiling a lazy query

Before adding a new query to a daily workflow, the team wants to see where time is actually spent. Profiling runs the lazy query like .collect() does, but also returns a timings DataFrame with one row per stage.

This exercise is part of the course

Scaling and Optimizing Data Pipelines with Polars

View Course

Exercise instructions

  • Filter library to checkouts where use is "Physical".
  • Keep the 10 longest titles by title_len using .top_k().
  • Execute the query with profiling so you also get a timings DataFrame.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

result, timings = (
    library
    # Filter to physical checkouts
    .filter(pl.col("use") == "____")
    .with_columns(pl.col("title").str.len_chars().alias("title_len"))
    # Keep the 10 longest titles
    .top_k(10, by="____")
    # Run with profiling to capture per-stage timings
    .____()
)
print(timings)
Edit and Run Code