Model training and predictions
After splitting the data into training and test data, in the second part of the exercise, you'll train the ALS algorithm using the training data. PySpark MLlib's ALS algorithm has the following mandatory parameters - rank
(the number of latent factors in the model) and iterations
(number of iterations to run). After training the ALS model, you can use the model to predict the ratings from the test data. For this, you will provide the user and item columns from the test dataset and finally return the list of 2 rows of predictAll()
output.
Remember, you have SparkContext sc
, training_data
and test_data
are already available in your workspace.
This exercise is part of the course
Big Data Fundamentals with PySpark
Exercise instructions
- Train ALS algorithm with training data and configured parameters (
rank
= 10 anditerations
= 10). - Drop the
rating
column in the test data, which is the third column. - Test the model by predicting the rating from the test data.
- Return a list of two rows of the predicted ratings.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Create the ALS model on the training data
model = ALS.____(____, rank=10, iterations=10)
# Drop the ratings column
testdata_no_rating = test_data.___(lambda p: (p[0], ____))
# Predict the model
predictions = model.____(testdata_no_rating)
# Return the first 2 rows of the RDD
predictions.____(2)