Get startedGet started for free

Create test/train splits and build your ALS model

You already know how to build an ALS model, having done it in the previous chapter. We will do that again, but we'll take some additional steps to fully build out a cross-validated model.

First, let's import the requisite functions and create our train and test data sets in preparation for the cross validation step.

This exercise is part of the course

Building Recommendation Engines with PySpark

View Course

Exercise instructions

  • Import the RegressionEvaluator from ml.evaluation, the ALS algorithm from ml.recommendation, and the ParamGridBuilder and the CrossValidator from ml.tuning.
  • Create an .80/.20 train/test split on the ratings data using the randomSplit method. Name the datasets train and test, and set the random seed to 1234.
  • Build out the ALS model, telling Spark the names of the columns in the ratings dataframe that correspond to the userCol, itemCol and ratingCol. Set the nonnegative argument to True, the coldStartStrategy to "drop" and let Spark know that these are not implicitPrefs by setting the implicitPrefs argument to False. Call this model als.
  • Verify that the model was created by calling the type() function on als. The output should indicate what type of model it is.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Import the required functions
from pyspark.ml.evaluation import ____
from pyspark.ml.recommendation import ____
from pyspark.ml.tuning import ____, ____

# Create test and train set
(train, test) = ratings.___([0.____, 0.____], seed = ____)

# Create ALS model
als = ALS(userCol="____", itemCol="____", ratingCol="____", nonnegative = ____, implicitPrefs = ____)

# Confirm that a model called "als" was created
type(____)
Edit and Run Code