IniziaInizia gratis

Flight duration model: Just distance

In this exercise you'll build a regression model to predict flight duration (the duration column).

For the moment you'll keep the model simple, including only the distance of the flight (the km column) as a predictor.

The data are in flights. The first few records are displayed in the terminal. These data have also been split into training and testing sets and are available as flights_train and flights_test.

Questo esercizio fa parte del corso

Machine Learning with PySpark

Visualizza il corso

Istruzioni dell'esercizio

  • Create a linear regression object. Specify the name of the label column. Fit it to the training data.
  • Make predictions on the testing data.
  • Create a regression evaluator object and use it to evaluate RMSE on the testing data.

Esercizio pratico interattivo

Prova a risolvere questo esercizio completando il codice di esempio.

from pyspark.ml.regression import LinearRegression
from pyspark.ml.evaluation import RegressionEvaluator

# Create a regression object and train on training data
regression = ____(____).____(____)

# Create predictions for the testing data and take a look at the predictions
predictions = ____.____(____)
predictions.select('duration', 'prediction').show(5, False)

# Calculate the RMSE
____(____).____(predictions)
Modifica ed esegui il codice