CommencerCommencer gratuitement

Tell Spark how to tune your ALS model

Now we'll need to create a ParamGrid to tell Spark what hyperparameters we want it to tune, how to tune them, and then build out an evaluator so Spark can know how to measure the algorithm's performance.

Cet exercice fait partie du cours

Building Recommendation Engines with PySpark

Afficher le cours

Instructions

  • Import RegressionEvaluator from pyspark.ml.evaluation and ParamGridBuilder and CrossValidator from pyspark.ml.tuning.
  • Build a ParamGrid called param_grid using the ParamGridBuilder provided. Call the .addGrid() method on each hyperparameter by providing the name of the model and the name of each hyperparameter (ex: .addGrid(als.rank, []). Do this for the rank, maxIter and regParam hyperparameters. Also provide the respective lists of hyperparameter values that Spark should try, as provided here:
 rank: [10, 50, 100, 150]  
 maxIter: [5, 50, 100, 200]  
 regParam: [.01, .05, .1, .15]  
  • Create a RegressionEvaluator called evaluator. Set the metricName to "rmse", set the labelCol to "rating", and tell Spark that when it generates predictions to call the predictionCol "prediction".
  • Run len(param_grid) to confirm that the param_grid was created and to confirm that the right number of hyperparameter combinations will be tested. It should be equal to the number of rank values * the number of maxIter values * the number of regParam values in the ParamGridBuilder.

Exercice interactif pratique

Essayez cet exercice en complétant cet exemple de code.

# Import the requisite items
from pyspark.ml.evaluation import ____
from pyspark.ml.____ import ____, ____

# Add hyperparameters and their respective values to param_grid
____ = ParamGridBuilder() \
            .addGrid(als.rank, [____, ____, ____, ____]) \
            .addGrid(als.____, [____, ____, ____, ____]) \
            .addGrid(als.____, [____, ____, ____, ____]) \
            .build()
           
# Define evaluator as RMSE and print length of evaluator
____ = RegressionEvaluator(metricName="____", labelCol="____", predictionCol="____") 
print ("Num models to be tested: ", len(param_grid))
Modifier et exécuter le code