Analytics with SQL on DataFrames
SQL queries are concise and easy to run compared to DataFrame operations. But in order to apply SQL queries on a DataFrame first, you need to create a temporary view of the DataFrame as a table and then apply SQL queries on the created table.
You already have a SparkContext spark
and salaries_df
available in your workspace.
Cet exercice fait partie du cours
Introduction to PySpark
Instructions
- Create temporary table
"salaries_table"
fromsalaries_df
DataFrame. - Construct a query to extract the "job_title" column from
company_location
in Canada ("CA"
). - Apply the SQL query and create a new DataFrame
canada_titles
. - Get a summary of the table.
Exercice interactif pratique
Essayez cet exercice en complétant cet exemple de code.
# Create a temporary view of salaries_table
salaries_df.____('salaries_table')
# Construct the "query"
query = '''SELECT job_title, salary_in_usd FROM ____ WHERE company_location == "CA"'''
# Apply the SQL "query"
canada_titles = spark.____(____)
# Generate basic statistics
canada_titles.____().show()