BaşlayınÜcretsiz Başlayın

Delayed flights with a Random Forest

In this exercise you'll bring together cross validation and ensemble methods. You'll be training a Random Forest classifier to predict delayed flights, using cross validation to choose the best values for model parameters.

You'll find good values for the following parameters:

  • featureSubsetStrategy — the number of features to consider for splitting at each node and
  • maxDepth — the maximum number of splits along any branch.

Unfortunately building this model takes too long, so we won't be running the .fit() method on the pipeline.

The RandomForestClassifier class has already been imported into the session.

Bu egzersiz

Machine Learning with PySpark

kursunun bir parçasıdır
Kursu Görüntüle

Egzersiz talimatları

  • Create a random forest classifier object.
  • Create a parameter grid builder object. Add grid points for the featureSubsetStrategy and maxDepth parameters.
  • Create binary classification evaluator.
  • Create a cross-validator object, specifying the estimator, parameter grid and evaluator. Choose 5-fold cross validation.

Uygulamalı interaktif egzersiz

Bu örnek kodu tamamlayarak bu egzersizi bitirin.

# Create a random forest classifier
forest = ____()

# Create a parameter grid
params = ____() \
            .____(____, ['all', 'onethird', 'sqrt', 'log2']) \
            .____(____, [2, 5, 10]) \
            .____()

# Create a binary classification evaluator
evaluator = ____()

# Create a cross-validator
cv = ____(____, ____, ____, ____)
Kodu Düzenle ve Çalıştır