Join the DataFrames
In the next two chapters you'll be working to build a model that predicts whether or not a flight will be delayed based on the flights data we've been working with. This model will also include information about the plane that flew the route, so the first step is to join the two tables: flights and planes!
Bu egzersiz
Foundations of PySpark
kursunun bir parçasıdırEgzersiz talimatları
- First, rename the
yearcolumn ofplanestoplane_yearto avoid duplicate column names. - Create a new DataFrame called
model_databy joining theflightstable withplanesusing thetailnumcolumn as the key.
Uygulamalı interaktif egzersiz
Bu örnek kodu tamamlayarak bu egzersizi bitirin.
# Rename year column
planes = planes.withColumnRenamed(____)
# Join the DataFrames
model_data = flights.join(____, on=____, how="leftouter")