SMS spam optimised

The pipeline you built earlier for the SMS spam model used the default parameters for all of the elements in the pipeline. It's very unlikely that these parameters will give a particularly good model though. In this exercise you're going to run the pipeline for a selection of parameter values. We're going to do this in a systematic way: the values for each of the hyperparameters will be laid out on a grid and then pipeline will systematically run across each point in the grid.

In this exercise you'll set up a parameter grid which can be used with cross validation to choose a good set of parameters for the SMS spam classifier.

The following are already defined:

hasher — a HashingTF object and
logistic — a LogisticRegression object.

Deze oefening maakt deel uit van de cursus

Machine Learning with PySpark

Cursus bekijken

Oefeninstructies

Create a parameter grid builder object.
Add grid points for numFeatures and binary parameters to the HashingTF object, giving values 1024, 4096 and 16384, and True and False, respectively.
Add grid points for regParam and elasticNetParam parameters to the LogisticRegression object, giving values of 0.01, 0.1, 1.0 and 10.0, and 0.0, 0.5, and 1.0 respectively.
Build the parameter grid.

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

# Create parameter grid
params = ____()

# Add grid for hashing trick parameters
params = params.____(____, ____) \
               .____(____, ____)

# Add grid for logistic regression parameters
params = params.____(____, ____) \
               .____(____, ____)

# Build parameter grid
params = ____.____()

Code bewerken en uitvoeren