Building a Regression Model
One of the great things about PySpark ML module is that most algorithms can be tried and tested without changing much code. Random Forest Regression is a fairly simple ensemble model, using bagging to fit. Another tree based ensemble model is Gradient Boosted Trees which uses a different approach called boosting to fit. In this exercise let's train a GBTRegressor
.
This exercise is part of the course
Feature Engineering with PySpark
Exercise instructions
- Import
GBTRegressor
frompyspark.ml.regression
which you will notice is the same module asRandomForestRegressor
. - Instantiate
GBTRegressor
withfeaturesCol
set to the vector column of our features named,features
,labelCol
set to our dependent variable,SALESCLOSEPRICE
and the randomseed
to42
- Train the model by calling
fit()
ongbt
with the imported training data,train_df
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
from ____ import ____
# Train a Gradient Boosted Trees (GBT) model.
gbt = ____(featuresCol=____,
labelCol=____,
predictionCol="Prediction_Price",
seed=____
)
# Train model.
model = gbt.fit(train_df)