Building a Regression Model
One of the great things about PySpark ML module is that most algorithms can be tried and tested without changing much code. Random Forest Regression is a fairly simple ensemble model, using bagging to fit. Another tree based ensemble model is Gradient Boosted Trees which uses a different approach called boosting to fit. In this exercise let's train a GBTRegressor.
Cet exercice fait partie du cours
Feature Engineering with PySpark
Instructions
- Import GBTRegressorfrompyspark.ml.regressionwhich you will notice is the same module asRandomForestRegressor.
- Instantiate GBTRegressorwithfeaturesColset to the vector column of our features named,features,labelColset to our dependent variable,SALESCLOSEPRICEand the randomseedto42
- Train the model by calling fit()ongbtwith the imported training data,train_df.
Exercice interactif pratique
Essayez cet exercice en complétant cet exemple de code.
from ____ import ____
# Train a Gradient Boosted Trees (GBT) model.
gbt = ____(featuresCol=____,
                           labelCol=____,
                           predictionCol="Prediction_Price",
                           seed=____
                           )
# Train model.
model = gbt.fit(train_df)