Aan de slagGa gratis aan de slag

Flight duration model: Adding departure time

In the previous exercise the departure time was bucketed and converted to dummy variables. Now you're going to include those dummy variables in a regression model for flight duration.

The data are in flights. The km, org_dummy and depart_dummy columns have been assembled into features, where km is index 0, org_dummy runs from index 1 to 7 and depart_dummy from index 8 to 14.

The data have been split into training and testing sets and a linear regression model, regression, has been built on the training data. Predictions have been made on the testing data and are available as predictions.

Deze oefening maakt deel uit van de cursus

Machine Learning with PySpark

Cursus bekijken

Oefeninstructies

  • Find the RMSE for predictions on the testing data.
  • Find the average time spent on the ground for flights departing from OGG between 21:00 and 24:00.
  • Find the average time spent on the ground for flights departing from OGG between 03:00 and 06:00.
  • Find the average time spent on the ground for flights departing from JFK between 03:00 and 06:00.

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

# Find the RMSE on testing data
from pyspark.ml.____ import ____
rmse = ____(____).____(____)
print("The test RMSE is", rmse)

# Average minutes on ground at OGG for flights departing between 21:00 and 24:00
avg_eve_ogg = regression.____
print(avg_eve_ogg)

# Average minutes on ground at OGG for flights departing between 03:00 and 06:00
avg_night_ogg = regression.____ + regression.____[9]
print(avg_night_ogg)

# Average minutes on ground at JFK for flights departing between 03:00 and 06:00
avg_night_jfk = regression.____ + regression.____[____] + regression.____[____]
print(avg_night_jfk)
Code bewerken en uitvoeren