Spark SQL Join
Sometimes it is much easier to write complex joins in SQL. In this exercise, we will start with the join keys already in the same format and precision but will use SparkSQL to do the joining.
Este ejercicio forma parte del curso
Feature Engineering with PySpark
Instrucciones del ejercicio
- Register the Dataframes as SparkSQL tables with
createOrReplaceTempView
, name them thedf
andwalk_df
respectively. - In the
join_sql
string, set the left table todf
and the right table towalk_df
- Call
spark.sql()
on thejoin_sql
string to perform the join.
Ejercicio interactivo práctico
Prueba este ejercicio y completa el código de muestra.
# Register dataframes as tables
____.createOrReplaceTempView(____)
____.createOrReplaceTempView(____)
# SQL to join dataframes
join_sql = """
SELECT
*
FROM ____
LEFT JOIN ____
ON df.longitude = walk_df.longitude
AND df.latitude = walk_df.latitude
"""
# Perform sql join
joined_df = spark.sql(____)