Build out an ALS model
Let's specify your first ALS model. Complete the code below to build your first ALS model.
Recall that you can use the .columns
method on the ratings
data frame to see what the names of the columns are that contain user, movie, and ratings data. Spark needs to know the names of these columns in order to perform ALS correctly.
This exercise is part of the course
Building Recommendation Engines with PySpark
Exercise instructions
- Before building our ALS model, we need to split the data into training data and test data. Use the
randomSplit()
method to split theratings
dataframe intotraining_data
andtest_data
using an 0.8/0.2 split respectively and aseed
for the random number generator of42
. - Tell Spark which columns contain the
userCol
,itemCol
andratingCol
. Use the.columns
method if needed. Complete the hyperparameters. Set therank
to 10, themaxIter
to 15, theregParam
or lambda to .1, thecoldStartStrategy
to"drop"
, thenonnegative
argument should be set toTrue
, and since our data contains explicit ratings, set theimplicitPrefs
argument toFalse
. - Now fit the
als
model to thetraining_data
portion of theratings
data by calling theals.fit()
method on thetraining_data
provided. Call the fitted modelmodel
. - Generate predictions on the
test_data
portion of theratings
data by calling themodel.transform()
method on thetest_data
provided. Call the predictionstest_predictions
. Feel free to view the predictions by calling the.show()
method on thetest_predictions
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Split the ratings dataframe into training and test data
(training_data, test_data) = ratings.____([____, ____], seed=42)
# Set the ALS hyperparameters
from pyspark.ml.recommendation import ALS
als = ALS(userCol="____", itemCol="____", ratingCol="____", rank =____, maxIter =____, regParam =____,
coldStartStrategy="____", nonnegative =____, implicitPrefs = ____)
# Fit the mdoel to the training_data
____ = ____.fit(____)
# Generate predictions on the test_data
____ = ____.transform(____)
test_predictions.show()