Session Ready
Exercise

Part 2: SQL Queries on DataFrame

The fifa_df DataFrame that we created has additional information about datatypes and names of columns associated with it. This additional information allows PySpark SQL to run SQL queries on DataFrame. SQL queries are concise and easy to run compared to DataFrame operations. But in order to apply SQL queries on DataFrame first, you need to create a temporary view of DataFrame as a table and then apply SQL queries on the created table (Running SQL Queries Programmatically).

In the second part, you'll create a temporary table of fifa_df DataFrame and run SQL queries to extract the 'Age' column of players from Germany.

You already have a SparkContext spark and fifa_df available in your workspace.

Instructions
100 XP
  • Create temporary table fifa_df from fifa_df_table DataFrame.
  • Construct a "query" to extract the "Age" column from Germany players.
  • Apply the SQL "query" to the temporary view table and create a new DataFrame.
  • Computes basic statistics of the created DataFrame.