Validating your predictions
1. Validating your predictions
You have now worked through a few different approaches in predicting how a user may feel about an item they have not seen and from that are able to recommend new items to them that they are most likely to enjoy. For all of these approaches, we have inspected the data to see if it looked correct, but we have not discussed how to measure how accurate these predictions are.2. Hold-out sets
What makes recommendation engines a little different when measuring predictions is that in more traditional machine learning models,3. Hold-out sets
you are trying to predict a single feature or column,4. Hold-out sets
but with recommendation engines, what you are trying to predict is far more inconsistent.5. Hold-out sets
Almost every user has reviewed different items, and each item has received reviews from different groups of users.6. Hold-out sets
For this reason, we cannot split our holdout set in the same way that we can for typical machine learning. In those cases, we would just split off a proportion of the row and use them to test our predictions as you see on the left.7. Hold-out sets
For recommendation engines, on the other hand, we need to remove a different chunk of the DataFrame, as seen on the right.8. Separating the hold-out set
This can be done in Python by first extracting the area you wish to compare. In our case, we will focus on the top left-hand corner of our base DataFrame consisting of the first 20 rows and the first 100 columns. Here they are selected using iloc. We then blank out the area with NaNs, as this is what we will be predicting. We then repeat the factorization from the last lesson to fill out the full DataFrame, and take a subset of the predicted DataFrame that you want to compare against. We now have the predicted values, and the original actual values that were not used to predict9. Masking the hold-out set
As we only want to compare the values that did exist, we mask the DataFrame to only compare non-missing fields. Now if we take a look at the masked original DataFrame and the predicted one, we can see that only the values we want to compare are present.10. Introducing RMSE (root mean squared error)
The metric most commonly used to measure how good a model is at predicting a recommendation is called root mean square error or RMSE for short.11. Introducing RMSE (root mean squared error)
With RMSE, we first calculate how far from the ground truth each prediction was (this is the error part in RMSE).12. Introducing RMSE (root mean squared error)
We then square this as we only care about how wrong it is, not in what direction.13. Introducing RMSE (root mean squared error)
We then find the average square error. The sum of all the errors divided by the total number, using our example shown here would be 5 over 3.14. Introducing RMSE (root mean squared error)
We then find the square root of this value.15. Introducing RMSE (root mean squared error)
This gives us a good measure of how close a set of predictions are to the actual values, and is very useful to compare between models.16. RMSE in Python
The Root mean square error can be found in Python using sklearn's mean_squared_error function, taking the two sets of data you want to compare as its first and second argument. We set the optional argument squared to False so we calculate the root mean square error as opposed to the mean square error.17. Let's practice!
Great, now you know how to measure how good your recommendations are, let's compare two different approaches.Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.