Compensating for incomplete data

For most datasets, the majority of users will have rated only a small number of items. As you saw in the last exercise, how you deal with users who do not have ratings for an item can greatly influence the validity of your models.

In this exercise, you will fill in missing data with information that should not bias the data that you do have.

You'll get the average score each user has given across all their ratings, and then use this average to center the users' scores around zero. Finally, you'll be able to fill in the empty values with zeros, which is now a neutral score, minimizing the impact on their overall profile, but still allowing the comparison of users.

user_ratings_table with a row per user has been loaded for you.

Find the average of the ratings given by each user in user_ratings_table and store them as avg_ratings.
Subtract the row averages from each row in user_ratings_table, and store it as user_ratings_table_centered.
Fill the empty values in the newly created user_ratings_table_centered with zeros.

Introduction to Recommendation Engines

Content-Based Recommendations

Collaborative Filtering

Matrix Factorization and Validating Your Predictions

Exercise

Compensating for incomplete data

Instructions