Get startedGet started for free

Building a Regression Model

One of the great things about PySpark ML module is that most algorithms can be tried and tested without changing much code. Random Forest Regression is a fairly simple ensemble model, using bagging to fit. Another tree based ensemble model is Gradient Boosted Trees which uses a different approach called boosting to fit. In this exercise let's train a GBTRegressor.

This exercise is part of the course

Feature Engineering with PySpark

View Course

Exercise instructions

  • Import GBTRegressor from pyspark.ml.regression which you will notice is the same module as RandomForestRegressor.
  • Instantiate GBTRegressor with featuresCol set to the vector column of our features named, features, labelCol set to our dependent variable, SALESCLOSEPRICE and the random seed to 42
  • Train the model by calling fit() on gbt with the imported training data, train_df.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

from ____ import ____

# Train a Gradient Boosted Trees (GBT) model.
gbt = ____(featuresCol=____,
                           labelCol=____,
                           predictionCol="Prediction_Price",
                           seed=____
                           )

# Train model.
model = gbt.fit(train_df)
Edit and Run Code