Spark SQL Join
Sometimes it is much easier to write complex joins in SQL. In this exercise, we will start with the join keys already in the same format and precision but will use SparkSQL to do the joining.
Questo esercizio fa parte del corso
Feature Engineering with PySpark
Istruzioni dell'esercizio
- Register the Dataframes as SparkSQL tables with
createOrReplaceTempView, name them thedfandwalk_dfrespectively. - In the
join_sqlstring, set the left table todfand the right table towalk_df - Call
spark.sql()on thejoin_sqlstring to perform the join.
Esercizio pratico interattivo
Prova a risolvere questo esercizio completando il codice di esempio.
# Register dataframes as tables
____.createOrReplaceTempView(____)
____.createOrReplaceTempView(____)
# SQL to join dataframes
join_sql = """
SELECT
*
FROM ____
LEFT JOIN ____
ON df.longitude = walk_df.longitude
AND df.latitude = walk_df.latitude
"""
# Perform sql join
joined_df = spark.sql(____)