Flight duration model: Just distance
In this exercise you'll build a regression model to predict flight duration (the duration column).
For the moment you'll keep the model simple, including only the distance of the flight (the km column) as a predictor.
The data are in flights. The first few records are displayed in the terminal. These data have also been split into training and testing sets and are available as flights_train and flights_test.
Questo esercizio fa parte del corso
Machine Learning with PySpark
Istruzioni dell'esercizio
- Create a linear regression object. Specify the name of the label column. Fit it to the training data.
- Make predictions on the testing data.
- Create a regression evaluator object and use it to evaluate RMSE on the testing data.
Esercizio pratico interattivo
Prova a risolvere questo esercizio completando il codice di esempio.
from pyspark.ml.regression import LinearRegression
from pyspark.ml.evaluation import RegressionEvaluator
# Create a regression object and train on training data
regression = ____(____).____(____)
# Create predictions for the testing data and take a look at the predictions
predictions = ____.____(____)
predictions.select('duration', 'prediction').show(5, False)
# Calculate the RMSE
____(____).____(predictions)