Building a Regression Model
One of the great things about PySpark ML module is that most algorithms can be tried and tested without changing much code. Random Forest Regression is a fairly simple ensemble model, using bagging to fit. Another tree based ensemble model is Gradient Boosted Trees which uses a different approach called boosting to fit. In this exercise let's train a GBTRegressor
.
Este exercício faz parte do curso
Feature Engineering with PySpark
Instruções do exercício
- Import
GBTRegressor
frompyspark.ml.regression
which you will notice is the same module asRandomForestRegressor
. - Instantiate
GBTRegressor
withfeaturesCol
set to the vector column of our features named,features
,labelCol
set to our dependent variable,SALESCLOSEPRICE
and the randomseed
to42
- Train the model by calling
fit()
ongbt
with the imported training data,train_df
.
Exercício interativo prático
Experimente este exercício completando este código de exemplo.
from ____ import ____
# Train a Gradient Boosted Trees (GBT) model.
gbt = ____(featuresCol=____,
labelCol=____,
predictionCol="Prediction_Price",
seed=____
)
# Train model.
model = gbt.fit(train_df)