Get startedGet started for free

Joining flights with their destination airports

You've been hired as a data engineer for a global travel company. Your first task is to help the company improve its operations by analyzing flight data. You have two datasets in your workspace: one containing details about flights (flights) and another with information about destination airports (airports), both are already available in your workspace..

Your goal? Combine these datasets to create a powerful dataset that links each flight to its destination airport.

This exercise is part of the course

Introduction to PySpark

View Course

Exercise instructions

  • Examine the airports DataFrame. Note which key column will let you join airports to the flights table.
  • Join the flights with the airports DataFrame on the "dest" column. Save the result as flights_with_airports.
  • Examine flights_with_airports again. Note the new information that has been added.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Examine the data
airports.____()

# .withColumnRenamed() renames the "faa" column to "dest"
airports = airports.withColumnRenamed("faa", "dest")

# Join the DataFrames
flights_with_airports = ____

# Examine the new DataFrame
flights_with_airports.____
Edit and Run Code