Get startedGet started for free

Compensating for incomplete data

For most datasets, the majority of users will have rated only a small number of items. As you saw in the last exercise, how you deal with users who do not have ratings for an item can greatly influence the validity of your models.

In this exercise, you will fill in missing data with information that should not bias the data that you do have.

You'll get the average score each user has given across all their ratings, and then use this average to center the users' scores around zero. Finally, you'll be able to fill in the empty values with zeros, which is now a neutral score, minimizing the impact on their overall profile, but still allowing the comparison of users.

user_ratings_table with a row per user has been loaded for you.

This exercise is part of the course

Building Recommendation Engines in Python

View Course

Exercise instructions

  • Find the average of the ratings given by each user in user_ratings_table and store them as avg_ratings.
  • Subtract the row averages from each row in user_ratings_table, and store it as user_ratings_table_centered.
  • Fill the empty values in the newly created user_ratings_table_centered with zeros.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Get the average rating for each user 
avg_ratings = user_ratings_table.____(axis=____)

# Center each users ratings around 0
user_ratings_table_centered = user_ratings_table.____(____, axis=0)

# Fill in the missing data with 0s
user_ratings_table_normed = user_ratings_table_centered.____(____)
Edit and Run Code