Get startedGet started for free

Tell Spark how to tune your ALS model

Now we'll need to create a ParamGrid to tell Spark what hyperparameters we want it to tune, how to tune them, and then build out an evaluator so Spark can know how to measure the algorithm's performance.

This exercise is part of the course

Building Recommendation Engines with PySpark

View Course

Exercise instructions

  • Import RegressionEvaluator from pyspark.ml.evaluation and ParamGridBuilder and CrossValidator from pyspark.ml.tuning.
  • Build a ParamGrid called param_grid using the ParamGridBuilder provided. Call the .addGrid() method on each hyperparameter by providing the name of the model and the name of each hyperparameter (ex: .addGrid(als.rank, []). Do this for the rank, maxIter and regParam hyperparameters. Also provide the respective lists of hyperparameter values that Spark should try, as provided here:
 rank: [10, 50, 100, 150]  
 maxIter: [5, 50, 100, 200]  
 regParam: [.01, .05, .1, .15]  
  • Create a RegressionEvaluator called evaluator. Set the metricName to "rmse", set the labelCol to "rating", and tell Spark that when it generates predictions to call the predictionCol "prediction".
  • Run len(param_grid) to confirm that the param_grid was created and to confirm that the right number of hyperparameter combinations will be tested. It should be equal to the number of rank values * the number of maxIter values * the number of regParam values in the ParamGridBuilder.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Import the requisite items
from pyspark.ml.evaluation import ____
from pyspark.ml.____ import ____, ____

# Add hyperparameters and their respective values to param_grid
____ = ParamGridBuilder() \
            .addGrid(als.rank, [____, ____, ____, ____]) \
            .addGrid(als.____, [____, ____, ____, ____]) \
            .addGrid(als.____, [____, ____, ____, ____]) \
            .build()
           
# Define evaluator as RMSE and print length of evaluator
____ = RegressionEvaluator(metricName="____", labelCol="____", predictionCol="____") 
print ("Num models to be tested: ", len(param_grid))
Edit and Run Code