Spark SQL Join
Sometimes it is much easier to write complex joins in SQL. In this exercise, we will start with the join keys already in the same format and precision but will use SparkSQL to do the joining.
Este exercício faz parte do curso
Feature Engineering with PySpark
Instruções do exercício
- Register the Dataframes as SparkSQL tables with
createOrReplaceTempView
, name them thedf
andwalk_df
respectively. - In the
join_sql
string, set the left table todf
and the right table towalk_df
- Call
spark.sql()
on thejoin_sql
string to perform the join.
Exercício interativo prático
Experimente este exercício completando este código de exemplo.
# Register dataframes as tables
____.createOrReplaceTempView(____)
____.createOrReplaceTempView(____)
# SQL to join dataframes
join_sql = """
SELECT
*
FROM ____
LEFT JOIN ____
ON df.longitude = walk_df.longitude
AND df.latitude = walk_df.latitude
"""
# Perform sql join
joined_df = spark.sql(____)