Create test/train splits and build your ALS model
You already know how to build an ALS model, having done it in the previous chapter. We will do that again, but we'll take some additional steps to fully build out a cross-validated model.
First, let's import the requisite functions and create our train and test data sets in preparation for the cross validation step.
This exercise is part of the course
Building Recommendation Engines with PySpark
Exercise instructions
- Import the
RegressionEvaluator
fromml.evaluation
, theALS
algorithm fromml.recommendation
, and theParamGridBuilder
and theCrossValidator
fromml.tuning
. - Create an .80/.20 train/test split on the
ratings
data using therandomSplit
method. Name the datasetstrain
andtest
, and set the random seed to1234
. - Build out the ALS model, telling Spark the names of the columns in the
ratings
dataframe that correspond to theuserCol
,itemCol
andratingCol
. Set thenonnegative
argument toTrue
, thecoldStartStrategy
to"drop"
and let Spark know that these are notimplicitPrefs
by setting theimplicitPrefs
argument toFalse
. Call this modelals
. - Verify that the model was created by calling the
type()
function onals
. The output should indicate what type of model it is.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Import the required functions
from pyspark.ml.evaluation import ____
from pyspark.ml.recommendation import ____
from pyspark.ml.tuning import ____, ____
# Create test and train set
(train, test) = ratings.___([0.____, 0.____], seed = ____)
# Create ALS model
als = ALS(userCol="____", itemCol="____", ratingCol="____", nonnegative = ____, implicitPrefs = ____)
# Confirm that a model called "als" was created
type(____)