Get startedGet started for free

Join the DataFrames

In the next two chapters you'll be working to build a model that predicts whether or not a flight will be delayed based on the flights data we've been working with. This model will also include information about the plane that flew the route, so the first step is to join the two tables: flights and planes!

This exercise is part of the course

Foundations of PySpark

View Course

Exercise instructions

  • First, rename the year column of planes to plane_year to avoid duplicate column names.
  • Create a new DataFrame called model_data by joining the flights table with planes using the tailnum column as the key.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Rename year column
planes = planes.withColumnRenamed(____)

# Join the DataFrames
model_data = flights.join(____, on=____, how="leftouter")
Edit and Run Code