Joining flights with their destination airports
You've been hired as a data engineer for a global travel company. Your first task is to help the company improve its operations by analyzing flight data. You have two datasets in your workspace: one containing details about flights (flights
) and another with information about destination airports (airports
), both are already available in your workspace..
Your goal? Combine these datasets to create a powerful dataset that links each flight to its destination airport.
Cet exercice fait partie du cours
Introduction to PySpark
Instructions
- Examine the
airports
DataFrame. Note which key column will let you joinairports
to theflights
table. - Join the
flights
with theairports
DataFrame on the"dest"
column. Save the result asflights_with_airports
. - Examine
flights_with_airports
again. Note the new information that has been added.
Exercice interactif pratique
Essayez cet exercice en complétant cet exemple de code.
# Examine the data
airports.____()
# .withColumnRenamed() renames the "faa" column to "dest"
airports = airports.withColumnRenamed("faa", "dest")
# Join the DataFrames
flights_with_airports = ____
# Examine the new DataFrame
flights_with_airports.____