Spark SQL Join
Sometimes it is much easier to write complex joins in SQL. In this exercise, we will start with the join keys already in the same format and precision but will use SparkSQL to do the joining.
This exercise is part of the course
Feature Engineering with PySpark
Exercise instructions
- Register the Dataframes as SparkSQL tables with
createOrReplaceTempView
, name them thedf
andwalk_df
respectively. - In the
join_sql
string, set the left table todf
and the right table towalk_df
- Call
spark.sql()
on thejoin_sql
string to perform the join.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Register dataframes as tables
____.createOrReplaceTempView(____)
____.createOrReplaceTempView(____)
# SQL to join dataframes
join_sql = """
SELECT
*
FROM ____
LEFT JOIN ____
ON df.longitude = walk_df.longitude
AND df.latitude = walk_df.latitude
"""
# Perform sql join
joined_df = spark.sql(____)