BaşlayınÜcretsiz Başlayın

Removing columns and rows

You previously loaded airline flight data from a CSV file. You're going to develop a model which will predict whether or not a given flight will be delayed.

In this exercise you need to trim those data down by:

  1. removing an uninformative column and
  2. removing rows which do not have information about whether or not a flight was delayed.

The data are available as flights.

Note:: You might find it useful to revise the slides from the lessons in the Slides panel next to the IPython Shell.

Bu egzersiz

Machine Learning with PySpark

kursunun bir parçasıdır
Kursu Görüntüle

Egzersiz talimatları

  • Remove the flight column.
  • Find out how many records have missing values in the delay column.
  • Remove records with missing values in the delay column.
  • Remove records with missing values in any column and get the number of remaining rows.

Uygulamalı interaktif egzersiz

Bu örnek kodu tamamlayarak bu egzersizi bitirin.

# Remove the 'flight' column
flights_drop_column = flights.____(____)

# Number of records with missing 'delay' values
flights_drop_column.____('delay IS NULL').____()

# Remove records with missing 'delay' values
flights_valid_delay = flights_drop_column.____(____)

# Remove records with missing values in any column and get the number of remaining rows
flights_none_missing = flights_valid_delay.____()
print(flights_none_missing.____())
Kodu Düzenle ve Çalıştır