1. Learn
  2. /
  3. Courses
  4. /
  5. Machine Learning with PySpark

Exercise

Removing columns and rows

You previously loaded airline flight data from a CSV file. You're going to develop a model which will predict whether or not a given flight will be delayed.

In this exercise you need to trim those data down by:

  1. removing an uninformative column and
  2. removing rows which do not have information about whether or not a flight was delayed.

The data are available as flights.

Note:: You might find it useful to revise the slides from the lessons in the Slides panel next to the IPython Shell.

Instructions

100 XP
  • Remove the flight column.
  • Find out how many records have missing values in the delay column.
  • Remove records with missing values in the delay column.
  • Remove records with missing values in any column and get the number of remaining rows.