CommencerCommencez gratuitement

Profiling a lazy query

Before adding a new query to a daily workflow, the team wants to see where time is actually spent. Profiling runs the lazy query like .collect() does, but also returns a timings DataFrame with one row per stage.

Cet exercice fait partie du cours

<cours>Scaling and Optimizing Data Pipelines with Polars</cours>
Voir le cours

Instructions de l’exercice

  • Filter library to checkouts where use is "Physical".
  • Keep the 10 longest titles by title_len using .top_k().
  • Execute the query with profiling so you also get a timings DataFrame.

Exercice interactif pratique

Essayez cet exercice en complétant ce code d’exemple.

result, timings = (
    library
    # Filter to physical checkouts
    .filter(pl.col("use") == "____")
    .with_columns(pl.col("title").str.len_chars().alias("title_len"))
    # Keep the 10 longest titles
    .top_k(10, by="____")
    # Run with profiling to capture per-stage timings
    .____()
)
print(timings)
Modifier et exécuter le code