Comparing recommendation methods
In this course, you have predicted how you believe a user would rate movies they have not seen using multiple different methods (basic average ratings, KNN, matrix factorization). In this final exercise, you'll work through a comparison of the averaged ratings and matrix factorization using the mean_squared_error()
as the measure of how well they are performing.
The predictions based on averages have been loaded as avg_pred_ratings_df
while the calculated predictions have been loaded as calc_pred_ratings_df
.
The ground truth values have been loaded as act_ratings_df
.
Finally, the mean_squared_error()
function has been imported for your use from sklearn.metrics
.
This exercise is part of the course
Building Recommendation Engines in Python
Exercise instructions
- Extract rows 0-20 and columns 0-100 (the areas that you want to compare) in the
act_ratings_df
,avg_pred_ratings_df
, andcalc_pred_ratings_df
DataFrames. - Create a mask of the
actual_values
DataFrame that targets only non-empty cells. - Find the mean squared error between the two predictions and the ground truth values.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Extract the ground truth to compare your predictions against
actual_values = act_ratings_df.____[:20, :100].values
avg_values = avg_pred_ratings_df.____[:20, :100].values
predicted_values = calc_pred_ratings_df.____[:20, :100].values
# Create a mask of actual_values to only look at the non-missing values in the ground truth
mask = ~np.isnan(____)
# Print the performance of both predictions and compare
print(____(____[mask], avg_values[mask], squared=False))
print(____(____[mask], predicted_values[mask], squared=False))