Get startedGet started for free

Model training and predictions

After splitting the data into training and test data, in the second part of the exercise, you'll train the ALS algorithm using the training data. PySpark MLlib's ALS algorithm has the following mandatory parameters - rank (the number of latent factors in the model) and iterations (number of iterations to run). After training the ALS model, you can use the model to predict the ratings from the test data. For this, you will provide the user and item columns from the test dataset and finally return the list of 2 rows of predictAll() output.

Remember, you have SparkContext sc, training_data and test_data are already available in your workspace.

This exercise is part of the course

Big Data Fundamentals with PySpark

View Course

Exercise instructions

  • Train ALS algorithm with training data and configured parameters (rank = 10 and iterations = 10).
  • Drop the rating column in the test data, which is the third column.
  • Test the model by predicting the rating from the test data.
  • Return a list of two rows of the predicted ratings.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Create the ALS model on the training data
model = ALS.____(____, rank=10, iterations=10)

# Drop the ratings column 
testdata_no_rating = test_data.___(lambda p: (p[0], ____))

# Predict the model  
predictions = model.____(testdata_no_rating)

# Return the first 2 rows of the RDD
predictions.____(2)
Edit and Run Code