Running sums using window function SQL
A window function is like an aggregate function, except that it gives an output for every row in the dataset instead of a single row per group.
You can do aggregation along with window functions. A running sum using a window function is simpler than what is required using joins. The query duration can also be much faster.
A table called schedule, having columns train_id, station, time, and diff_min is provided for you. The diff_min column gives the elapsed time between the current station and the next station on the line.
Bu egzersiz
Introduction to Spark SQL in Python
kursunun bir parçasıdırEgzersiz talimatları
- Run a query that adds an additional column to the records in this dataset called
running_total. The columnrunning_totalSUM()s the difference between station time given by thediff_mincolumn. - Run the query and display the result.
Uygulamalı interaktif egzersiz
Bu örnek kodu tamamlayarak bu egzersizi bitirin.
# Add col running_total that sums diff_min col in each group
query = """
SELECT train_id, station, time, diff_min,
____(____) OVER (PARTITION BY ____ ORDER BY ____) AS running_total
FROM schedule
"""
# Run the query and display the result
spark.____(query).show()