Spark SQL Join
Sometimes it is much easier to write complex joins in SQL. In this exercise, we will start with the join keys already in the same format and precision but will use SparkSQL to do the joining.
Cet exercice fait partie du cours
Feature Engineering with PySpark
Instructions
- Register the Dataframes as SparkSQL tables with createOrReplaceTempView, name them thedfandwalk_dfrespectively.
- In the join_sqlstring, set the left table todfand the right table towalk_df
- Call spark.sql()on thejoin_sqlstring to perform the join.
Exercice interactif pratique
Essayez cet exercice en complétant cet exemple de code.
# Register dataframes as tables
____.createOrReplaceTempView(____)
____.createOrReplaceTempView(____)
# SQL to join dataframes
join_sql = 	"""
			SELECT 
				*
			FROM ____
			LEFT JOIN ____
			ON df.longitude = walk_df.longitude
			AND df.latitude = walk_df.latitude
			"""
# Perform sql join
joined_df = spark.sql(____)