BaşlayınÜcretsiz Başlayın

Making a Boolean

Consider that you're modeling a yes or no question: is the flight late? However, your data contains the arrival delay in minutes for each flight. Thus, you'll need to create a boolean column which indicates whether the flight was late or not!

Bu egzersiz

Foundations of PySpark

kursunun bir parçasıdır
Kursu Görüntüle

Egzersiz talimatları

  • Use the .withColumn() method to create the column is_late. This column is equal to model_data.arr_delay > 0.
  • Convert this column to an integer column so that you can use it in your model and name it label (this is the default name for the response variable in Spark's machine learning routines).
  • Filter out missing values (this has been done for you).

Uygulamalı interaktif egzersiz

Bu örnek kodu tamamlayarak bu egzersizi bitirin.

# Create is_late
model_data = model_data.withColumn("is_late", ____)

# Convert to an integer
model_data = model_data.withColumn("label", ____)

# Remove missing values
model_data = model_data.filter("arr_delay is not NULL and dep_delay is not NULL and air_time is not NULL and plane_year is not NULL")
Kodu Düzenle ve Çalıştır