1. Learn
  2. /
  3. Courses
  4. /
  5. Introduction to PySpark

Connected

Exercise

Analytics with SQL on DataFrames

SQL queries are concise and easy to run compared to DataFrame operations. But in order to apply SQL queries on a DataFrame first, you need to create a temporary view of the DataFrame as a table and then apply SQL queries on the created table.

You already have a SparkContext spark and salaries_df available in your workspace.

Instructions

100 XP
  • Create temporary table "salaries_table" from salaries_df DataFrame.
  • Construct a query to extract the "job_title" column from company_location in Canada ("CA").
  • Apply the SQL query and create a new DataFrame canada_titles.
  • Get a summary of the table.