Aan de slagGa gratis aan de slag

Analytics with SQL on DataFrames

SQL queries are concise and easy to run compared to DataFrame operations. But in order to apply SQL queries on a DataFrame first, you need to create a temporary view of the DataFrame as a table and then apply SQL queries on the created table.

You already have a SparkContext spark and salaries_df available in your workspace.

Deze oefening maakt deel uit van de cursus

Introduction to PySpark

Cursus bekijken

Oefeninstructies

  • Create temporary table "salaries_table" from salaries_df DataFrame.
  • Construct a query to extract the "job_title" column from company_location in Canada ("CA").
  • Apply the SQL query and create a new DataFrame canada_titles.
  • Get a summary of the table.

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

# Create a temporary view of salaries_table
salaries_df.____('salaries_table')

# Construct the "query"
query = '''SELECT job_title, salary_in_usd FROM ____ WHERE company_location == "CA"'''

# Apply the SQL "query"
canada_titles = spark.____(____)

# Generate basic statistics
canada_titles.____().show()
Code bewerken en uitvoeren