Spark SQL Join
Sometimes it is much easier to write complex joins in SQL. In this exercise, we will start with the join keys already in the same format and precision but will use SparkSQL to do the joining.
Diese Übung ist Teil des Kurses
<Kurs>Feature Engineering with PySpark</Kurs>Übungsanweisungen
- Register the Dataframes as SparkSQL tables with
createOrReplaceTempView, name them thedfandwalk_dfrespectively. - In the
join_sqlstring, set the left table todfand the right table towalk_df - Call
spark.sql()on thejoin_sqlstring to perform the join.
Interaktive praktische Übung
Versuche dich an dieser Übung, indem du diesen Beispielcode vervollständigst.
# Register dataframes as tables
____.createOrReplaceTempView(____)
____.createOrReplaceTempView(____)
# SQL to join dataframes
join_sql = """
SELECT
*
FROM ____
LEFT JOIN ____
ON df.longitude = walk_df.longitude
AND df.latitude = walk_df.latitude
"""
# Perform sql join
joined_df = spark.sql(____)