Generate predictions and calculate RMSE
Now that we have a model that is trained on our data and tuned through cross validation, we can see how it performs on the test
dataframe. To do this, we'll calculate the RMSE.
As a side note, the generation of test predictions takes more than a few minutes with this dataset. For this reason, the test predictions have been generated already and are provided here as a dataframe called test_predictions
. For your reference, they are generated using this code: test_predictions = best_model.transform(test)
.
This exercise is part of the course
Building Recommendation Engines with PySpark
Exercise instructions
- The dataframe
test_predictions
contains predictions that our cross-validated ALS model generated using thetest
set that we created previously. Use the.show()
method to take a look at it and see if the predictions seem close. - Use the
evaluator
that you built previously to calculate theRMSE
by calling the.evaluate()
method on thetest_predictions
generated. Call thisRMSE
. - Print the
RMSE
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# View the predictions
test_predictions.____()
# Calculate and print the RMSE of test_predictions
RMSE = evaluator.____(____)
print(____)