Evaluating & Comparing Algorithms
Now that we've created a new model with GBTRegressor
its time to compare it against our baseline of RandomForestRegressor
. To do this we will compare the predictions of both models to the actual data and calculate RMSE and R^2.
This exercise is part of the course
Feature Engineering with PySpark
Exercise instructions
- Import
RegressionEvaluator
frompyspark.ml.evaluation
so it is available for use later. - Initialize
RegressionEvaluator
by settinglabelCol
to our actual data,SALESCLOSEPRICE
andpredictionCol
to our predicted data,Prediction_Price
- To calculate our metrics, call
evaluate
onevaluator
with the prediction valuespreds
and create a dictionary with keyevaluator.metricName
and value ofrmse
, do the same for ther2
metric.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
from ____ import ____
# Select columns to compute test error
evaluator = ____(____=____,
____=____)
# Dictionary of model predictions to loop over
models = {'Gradient Boosted Trees': gbt_predictions, 'Random Forest Regression': rfr_predictions}
for key, preds in models.items():
# Create evaluation metrics
rmse = evaluator.____(____, {____: ____})
r2 = evaluator.____(____, {____: ____})
# Print Model Metrics
print(key + ' RMSE: ' + str(rmse))
print(key + ' R^2: ' + str(r2))