Make a grid
Next, you need to create a grid of values to search over when looking for the optimal hyperparameters. The submodule pyspark.ml.tuning
includes a class called ParamGridBuilder
that does just that (maybe you're starting to notice a pattern here; PySpark has a submodule for just about everything!).
You'll need to use the .addGrid()
and .build()
methods to create a grid that you can use for cross validation. The .addGrid()
method takes a model parameter (an attribute of the model Estimator
, lr
, that you created a few exercises ago) and a list of values that you want to try.
The .build()
method takes no arguments, it just returns the grid that you'll use later.
This exercise is part of the course
Foundations of PySpark
Exercise instructions
- Import the submodule
pyspark.ml.tuning
under the aliastune
. - Call the class constructor
ParamGridBuilder()
with no arguments. Save this asgrid
. - Call the
.addGrid()
method ongrid
withlr.regParam
as the first argument andnp.arange(0, .1, .01)
as the second argument. This second call is a function from thenumpy
module (importedas np
) that creates a list of numbers from 0 to .1, incrementing by .01. Overwritegrid
with the result. - Update
grid
again by calling the.addGrid()
method a second time create a grid forlr.elasticNetParam
that includes only the values[0, 1]
. - Call the
.build()
method ongrid
and overwrite it with the output.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Import the tuning submodule
import ____ as ____
# Create the parameter grid
grid = tune.____
# Add the hyperparameter
grid = grid.addGrid(____, np.arange(0, .1, .01))
grid = grid.addGrid(____, ____)
# Build the grid
grid = grid.build()