Querying on a temp view
In this exercise, you'll practice registering a DataFrame as a temporary SQL view in PySpark. Temporary views are powerful tools that allow you to query data using SQL syntax, making complex data manipulations easier and more intuitive. Your goal is to create a view from a provided DataFrame and run SQL queries against it, a common task for ETL and ELT work.
You already have a SparkContext, spark, and a PySpark DataFrame, df, available in your workspace.
Questo esercizio fa parte del corso
Introduction to PySpark
Istruzioni dell'esercizio
- Register a new view called
"data_view"from the DataFramedf. - Run the provided SQL query to calculate total salary by position.
Esercizio pratico interattivo
Prova a risolvere questo esercizio completando il codice di esempio.
# Register as a view
df.____("data_view")
# Advanced SQL query: Calculate total salary by Position
result = ____("""
SELECT Position, SUM(Salary) AS Total_Salary
FROM data_view
GROUP BY Position
ORDER BY Total_Salary DESC
"""
)
result.show()