Aan de slagGa gratis aan de slag

Querying on a temp view

In this exercise, you'll practice registering a DataFrame as a temporary SQL view in PySpark. Temporary views are powerful tools that allow you to query data using SQL syntax, making complex data manipulations easier and more intuitive. Your goal is to create a view from a provided DataFrame and run SQL queries against it, a common task for ETL and ELT work.

You already have a SparkContext, spark, and a PySpark DataFrame, df, available in your workspace.

Deze oefening maakt deel uit van de cursus

Introduction to PySpark

Cursus bekijken

Oefeninstructies

  • Register a new view called "data_view" from the DataFrame df.
  • Run the provided SQL query to calculate total salary by position.

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

# Register as a view
df.____("data_view")

# Advanced SQL query: Calculate total salary by Position
result = ____("""
    SELECT Position, SUM(Salary) AS Total_Salary
    FROM data_view
    GROUP BY Position
    ORDER BY Total_Salary DESC
    """
)
result.show()
Code bewerken en uitvoeren