BaşlayınÜcretsiz Başlayın

Flight duration model: Pipeline stages

You're going to create the stages for the flights duration model pipeline. You will use these in the next exercise to build a pipeline and to create a regression model.

The StringIndexer, OneHotEncoder, VectorAssembler and LinearRegression classes are already imported.

Bu egzersiz

Machine Learning with PySpark

kursunun bir parçasıdır
Kursu Görüntüle

Egzersiz talimatları

  • Create an indexer to convert the 'org' column into an indexed column called 'org_idx'.
  • Create a one-hot encoder to convert the 'org_idx' and 'dow' columns into dummy variable columns called 'org_dummy' and 'dow_dummy'.
  • Create an assembler which will combine the 'km' column with the two dummy variable columns. The output column should be called 'features'.
  • Create a linear regression object to predict flight duration.

You might find it useful to revisit the slides from the lessons in the Slides panel next to the IPython Shell.

Uygulamalı interaktif egzersiz

Bu örnek kodu tamamlayarak bu egzersizi bitirin.

# Convert categorical strings to index values
indexer = ____(____)

# One-hot encode index values
onehot = ____(
    inputCols=____,
    outputCols=____
)

# Assemble predictors into a single column
assembler = ____(inputCols=____, outputCol=____)

# A linear regression object
regression = ____(labelCol=____)
Kodu Düzenle ve Çalıştır