Compensating for incomplete data

For most datasets, the majority of users will have rated only a small number of items. As you saw in the last exercise, how you deal with users who do not have ratings for an item can greatly influence the validity of your models.

In this exercise, you will fill in missing data with information that should not bias the data that you do have.

You'll get the average score each user has given across all their ratings, and then use this average to center the users' scores around zero. Finally, you'll be able to fill in the empty values with zeros, which is now a neutral score, minimizing the impact on their overall profile, but still allowing the comparison of users.

user_ratings_table with a row per user has been loaded for you.

Cet exercice fait partie du cours

Building Recommendation Engines in Python

Afficher le cours

Instructions

Find the average of the ratings given by each user in user_ratings_table and store them as avg_ratings.
Subtract the row averages from each row in user_ratings_table, and store it as user_ratings_table_centered.
Fill the empty values in the newly created user_ratings_table_centered with zeros.

Exercice interactif pratique

Essayez cet exercice en complétant cet exemple de code.

# Get the average rating for each user 
avg_ratings = user_ratings_table.____(axis=____)

# Center each users ratings around 0
user_ratings_table_centered = user_ratings_table.____(____, axis=0)

# Fill in the missing data with 0s
user_ratings_table_normed = user_ratings_table_centered.____(____)

Modifier et exécuter le code