Get startedGet started for free

SMS spam optimised

The pipeline you built earlier for the SMS spam model used the default parameters for all of the elements in the pipeline. It's very unlikely that these parameters will give a particularly good model though. In this exercise you're going to run the pipeline for a selection of parameter values. We're going to do this in a systematic way: the values for each of the hyperparameters will be laid out on a grid and then pipeline will systematically run across each point in the grid.

In this exercise you'll set up a parameter grid which can be used with cross validation to choose a good set of parameters for the SMS spam classifier.

The following are already defined:

  • hasher — a HashingTF object and
  • logistic — a LogisticRegression object.

This exercise is part of the course

Machine Learning with PySpark

View Course

Exercise instructions

  • Create a parameter grid builder object.
  • Add grid points for numFeatures and binary parameters to the HashingTF object, giving values 1024, 4096 and 16384, and True and False, respectively.
  • Add grid points for regParam and elasticNetParam parameters to the LogisticRegression object, giving values of 0.01, 0.1, 1.0 and 10.0, and 0.0, 0.5, and 1.0 respectively.
  • Build the parameter grid.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Create parameter grid
params = ____()

# Add grid for hashing trick parameters
params = params.____(____, ____) \
               .____(____, ____)

# Add grid for logistic regression parameters
params = params.____(____, ____) \
               .____(____, ____)

# Build parameter grid
params = ____.____()
Edit and Run Code