Create test/train splits and build your ALS model
You already know how to build an ALS model, having done it in the previous chapter. We will do that again, but we'll take some additional steps to fully build out a cross-validated model.
First, let's import the requisite functions and create our train and test data sets in preparation for the cross validation step.
Cet exercice fait partie du cours
Building Recommendation Engines with PySpark
Instructions
- Import the
RegressionEvaluatorfromml.evaluation, theALSalgorithm fromml.recommendation, and theParamGridBuilderand theCrossValidatorfromml.tuning. - Create an .80/.20 train/test split on the
ratingsdata using therandomSplitmethod. Name the datasetstrainandtest, and set the random seed to1234. - Build out the ALS model, telling Spark the names of the columns in the
ratingsdataframe that correspond to theuserCol,itemColandratingCol. Set thenonnegativeargument toTrue, thecoldStartStrategyto"drop"and let Spark know that these are notimplicitPrefsby setting theimplicitPrefsargument toFalse. Call this modelals. - Verify that the model was created by calling the
type()function onals. The output should indicate what type of model it is.
Exercice interactif pratique
Essayez cet exercice en complétant cet exemple de code.
# Import the required functions
from pyspark.ml.evaluation import ____
from pyspark.ml.recommendation import ____
from pyspark.ml.tuning import ____, ____
# Create test and train set
(train, test) = ratings.___([0.____, 0.____], seed = ____)
# Create ALS model
als = ALS(userCol="____", itemCol="____", ratingCol="____", nonnegative = ____, implicitPrefs = ____)
# Confirm that a model called "als" was created
type(____)