Tell Spark how to tune your ALS model
Now we'll need to create a ParamGrid
to tell Spark what hyperparameters we want it to tune, how to tune them, and then build out an evaluator so Spark can know how to measure the algorithm's performance.
This exercise is part of the course
Building Recommendation Engines with PySpark
Exercise instructions
- Import
RegressionEvaluator
frompyspark.ml.evaluation
andParamGridBuilder
andCrossValidator
frompyspark.ml.tuning
. - Build a
ParamGrid
calledparam_grid
using theParamGridBuilder
provided. Call the.addGrid()
method on each hyperparameter by providing the name of the model and the name of each hyperparameter (ex:.addGrid(als.rank, [])
. Do this for therank
,maxIter
andregParam
hyperparameters. Also provide the respective lists of hyperparameter values that Spark should try, as provided here:
rank: [10, 50, 100, 150]
maxIter: [5, 50, 100, 200]
regParam: [.01, .05, .1, .15]
- Create a
RegressionEvaluator
calledevaluator
. Set themetricName
to"rmse"
, set thelabelCol
to"rating"
, and tell Spark that when it generates predictions to call thepredictionCol
"prediction"
. - Run
len(param_grid)
to confirm that the param_grid was created and to confirm that the right number of hyperparameter combinations will be tested. It should be equal to the number of rank values * the number of maxIter values * the number of regParam values in the ParamGridBuilder.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Import the requisite items
from pyspark.ml.evaluation import ____
from pyspark.ml.____ import ____, ____
# Add hyperparameters and their respective values to param_grid
____ = ParamGridBuilder() \
.addGrid(als.rank, [____, ____, ____, ____]) \
.addGrid(als.____, [____, ____, ____, ____]) \
.addGrid(als.____, [____, ____, ____, ____]) \
.build()
# Define evaluator as RMSE and print length of evaluator
____ = RegressionEvaluator(metricName="____", labelCol="____", predictionCol="____")
print ("Num models to be tested: ", len(param_grid))