Model training and predictions

After splitting the data into training and test data, in the second part of the exercise, you'll train the ALS algorithm using the training data. PySpark MLlib's ALS algorithm has the following mandatory parameters - rank (the number of latent factors in the model) and iterations (number of iterations to run). After training the ALS model, you can use the model to predict the ratings from the test data. For this, you will provide the user and item columns from the test dataset and finally return the list of 2 rows of predictAll() output.

Remember, you have SparkContext sc, training_data and test_data are already available in your workspace.

Train ALS algorithm with training data and configured parameters (rank = 10 and iterations = 10).
Drop the rating column in the test data, which is the third column.
Test the model by predicting the rating from the test data.
Return a list of two rows of the predicted ratings.

Introduction to Big Data analysis with Spark

Programming in PySpark RDD’s

PySpark SQL & DataFrames

Machine Learning with PySpark MLlib

Exercise

Model training and predictions

Instructions