Profiling a lazy query
Before adding a new query to a daily workflow, the team wants to see where time is actually spent. Profiling runs the lazy query like .collect() does, but also returns a timings DataFrame with one row per stage.
Este ejercicio forma parte del curso
Scaling and Optimizing Data Pipelines with Polars
Instrucciones del ejercicio
- Filter
libraryto checkouts whereuseis"Physical". - Keep the 10 longest titles by
title_lenusing.top_k(). - Execute the query with profiling so you also get a timings DataFrame.
ejercicio interactivo práctico
Prueba este ejercicio completando este código de ejemplo.
result, timings = (
library
# Filter to physical checkouts
.filter(pl.col("use") == "____")
.with_columns(pl.col("title").str.len_chars().alias("title_len"))
# Keep the 10 longest titles
.top_k(10, by="____")
# Run with profiling to capture per-stage timings
.____()
)
print(timings)