Get startedGet started for free

Flight duration model: Adding departure time

In the previous exercise the departure time was bucketed and converted to dummy variables. Now you're going to include those dummy variables in a regression model for flight duration.

The data are in flights. The km, org_dummy and depart_dummy columns have been assembled into features, where km is index 0, org_dummy runs from index 1 to 7 and depart_dummy from index 8 to 14.

The data have been split into training and testing sets and a linear regression model, regression, has been built on the training data. Predictions have been made on the testing data and are available as predictions.

This exercise is part of the course

Machine Learning with PySpark

View Course

Exercise instructions

  • Find the RMSE for predictions on the testing data.
  • Find the average time spent on the ground for flights departing from OGG between 21:00 and 24:00.
  • Find the average time spent on the ground for flights departing from OGG between 03:00 and 06:00.
  • Find the average time spent on the ground for flights departing from JFK between 03:00 and 06:00.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Find the RMSE on testing data
from pyspark.ml.____ import ____
rmse = ____(____).____(____)
print("The test RMSE is", rmse)

# Average minutes on ground at OGG for flights departing between 21:00 and 24:00
avg_eve_ogg = regression.____
print(avg_eve_ogg)

# Average minutes on ground at OGG for flights departing between 03:00 and 06:00
avg_night_ogg = regression.____ + regression.____[9]
print(avg_night_ogg)

# Average minutes on ground at JFK for flights departing between 03:00 and 06:00
avg_night_jfk = regression.____ + regression.____[____] + regression.____[____]
print(avg_night_jfk)
Edit and Run Code