Running a cross-validated implicit ALS model
Now that we have several ALS models, each with a different set of hyperparameter values, we can train them on a training portion of the msd
dataset using cross validation, and then run them on a test set of data and evaluate how well each one performs using the ROEM
function discussed earlier. Unfortunately, this takes too much time for this exercise, so it has been done separately. But for your reference you can evaluate your model_list
using the following loop (we are using the msd
dataset in this case):
# Split the data into training and test sets
(training, test) = msd.randomSplit([0.8, 0.2])
#Building 5 folds within the training set.
train1, train2, train3, train4, train5 = training.randomSplit([0.2, 0.2, 0.2, 0.2, 0.2], seed = 1)
fold1 = train2.union(train3).union(train4).union(train5)
fold2 = train3.union(train4).union(train5).union(train1)
fold3 = train4.union(train5).union(train1).union(train2)
fold4 = train5.union(train1).union(train2).union(train3)
fold5 = train1.union(train2).union(train3).union(train4)
foldlist = [(fold1, train1), (fold2, train2), (fold3, train3), (fold4, train4), (fold5, train5)]
# Empty list to fill with ROEMs from each model
ROEMS = []
# Loops through all models and all folds
for model in model_list:
for ft_pair in foldlist:
# Fits model to fold within training data
fitted_model = model.fit(ft_pair[0])
# Generates predictions using fitted_model on respective CV test data
predictions = fitted_model.transform(ft_pair[1])
# Generates and prints a ROEM metric CV test data
r = ROEM(predictions)
print ("ROEM: ", r)
# Fits model to all of training data and generates preds for test data
v_fitted_model = model.fit(training)
v_predictions = v_fitted_model.transform(test)
v_ROEM = ROEM(v_predictions)
# Adds validation ROEM to ROEM list
ROEMS.append(v_ROEM)
print ("Validation ROEM: ", v_ROEM)
For purposes of walking you through the steps, the test predictions for 192 models have already been generated, and their ROEM
has been calculated. They are found in the ROEMS
list provided. Because a list isn't unique to Pyspark, and because numpy
works really well with lists, we're going to use numpy
here. Follow the instructions below to find the best ROEM
and the model that provided it.
This exercise is part of the course
Building Recommendation Engines with PySpark
Exercise instructions
- Import
numpy
. - Extract the smallest ROEM from the
ROEMS
list provided usingnumpy.argmin()
. The.argmin()
method will return the index of the lowest value in the list provided. Call the resulti
and printi
. - Use list slicing to find the value in the
ROEMS
list at indexi
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Import numpy
import numpy
# Find the index of the smallest ROEM
i = numpy.____(____)
print("Index of smallest ROEM:", ____)
# Find ith element of ROEMS
print("Smallest ROEM: ", ____[____])