SMS spam optimised
The pipeline you built earlier for the SMS spam model used the default parameters for all of the elements in the pipeline. It's very unlikely that these parameters will give a particularly good model though. In this exercise you're going to run the pipeline for a selection of parameter values. We're going to do this in a systematic way: the values for each of the hyperparameters will be laid out on a grid and then pipeline will systematically run across each point in the grid.
In this exercise you'll set up a parameter grid which can be used with cross validation to choose a good set of parameters for the SMS spam classifier.
The following are already defined:
hasher— aHashingTFobject andlogistic— aLogisticRegressionobject.
Deze oefening maakt deel uit van de cursus
Machine Learning with PySpark
Oefeninstructies
- Create a parameter grid builder object.
- Add grid points for
numFeaturesandbinaryparameters to theHashingTFobject, giving values 1024, 4096 and 16384, and True and False, respectively. - Add grid points for
regParamandelasticNetParamparameters to theLogisticRegressionobject, giving values of 0.01, 0.1, 1.0 and 10.0, and 0.0, 0.5, and 1.0 respectively. - Build the parameter grid.
Praktische interactieve oefening
Probeer deze oefening eens door deze voorbeeldcode in te vullen.
# Create parameter grid
params = ____()
# Add grid for hashing trick parameters
params = params.____(____, ____) \
.____(____, ____)
# Add grid for logistic regression parameters
params = params.____(____, ____) \
.____(____, ____)
# Build parameter grid
params = ____.____()