Joining flights with their destination airports
You've been hired as a data engineer for a global travel company. Your first task is to help the company improve its operations by analyzing flight data. You have two datasets in your workspace: one containing details about flights (flights
) and another with information about destination airports (airports
), both are already available in your workspace..
Your goal? Combine these datasets to create a powerful dataset that links each flight to its destination airport.
This exercise is part of the course
Introduction to PySpark
Exercise instructions
- Examine the
airports
DataFrame. Note which key column will let you joinairports
to theflights
table. - Join the
flights
with theairports
DataFrame on the"dest"
column. Save the result asflights_with_airports
. - Examine
flights_with_airports
again. Note the new information that has been added.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Examine the data
airports.____()
# .withColumnRenamed() renames the "faa" column to "dest"
airports = airports.withColumnRenamed("faa", "dest")
# Join the DataFrames
flights_with_airports = ____
# Examine the new DataFrame
flights_with_airports.____