Building a Regression Model

One of the great things about PySpark ML module is that most algorithms can be tried and tested without changing much code. Random Forest Regression is a fairly simple ensemble model, using bagging to fit. Another tree based ensemble model is Gradient Boosted Trees which uses a different approach called boosting to fit. In this exercise let's train a GBTRegressor.

Cet exercice fait partie du cours

Feature Engineering with PySpark

Afficher le cours

Instructions

Import GBTRegressor from pyspark.ml.regression which you will notice is the same module as RandomForestRegressor.
Instantiate GBTRegressor with featuresCol set to the vector column of our features named, features, labelCol set to our dependent variable, SALESCLOSEPRICE and the random seed to 42
Train the model by calling fit() on gbt with the imported training data, train_df.

Exercice interactif pratique

Essayez cet exercice en complétant cet exemple de code.

from ____ import ____

# Train a Gradient Boosted Trees (GBT) model.
gbt = ____(featuresCol=____,
                           labelCol=____,
                           predictionCol="Prediction_Price",
                           seed=____
                           )

# Train model.
model = gbt.fit(train_df)

Modifier et exécuter le code