CommencerCommencer gratuitement

Analytics with SQL on DataFrames

SQL queries are concise and easy to run compared to DataFrame operations. But in order to apply SQL queries on a DataFrame first, you need to create a temporary view of the DataFrame as a table and then apply SQL queries on the created table.

You already have a SparkContext spark and salaries_df available in your workspace.

Cet exercice fait partie du cours

Introduction to PySpark

Afficher le cours

Instructions

  • Create temporary table "salaries_table" from salaries_df DataFrame.
  • Construct a query to extract the "job_title" column from company_location in Canada ("CA").
  • Apply the SQL query and create a new DataFrame canada_titles.
  • Get a summary of the table.

Exercice interactif pratique

Essayez cet exercice en complétant cet exemple de code.

# Create a temporary view of salaries_table
salaries_df.____('salaries_table')

# Construct the "query"
query = '''SELECT job_title, salary_in_usd FROM ____ WHERE company_location == "CA"'''

# Apply the SQL "query"
canada_titles = spark.____(____)

# Generate basic statistics
canada_titles.____().show()
Modifier et exécuter le code