IniziaInizia gratis

Compensating for incomplete data

For most datasets, the majority of users will have rated only a small number of items. As you saw in the last exercise, how you deal with users who do not have ratings for an item can greatly influence the validity of your models.

In this exercise, you will fill in missing data with information that should not bias the data that you do have.

You'll get the average score each user has given across all their ratings, and then use this average to center the users' scores around zero. Finally, you'll be able to fill in the empty values with zeros, which is now a neutral score, minimizing the impact on their overall profile, but still allowing the comparison of users.

user_ratings_table with a row per user has been loaded for you.

Questo esercizio fa parte del corso

Building Recommendation Engines in Python

Visualizza il corso

Istruzioni dell'esercizio

  • Find the average of the ratings given by each user in user_ratings_table and store them as avg_ratings.
  • Subtract the row averages from each row in user_ratings_table, and store it as user_ratings_table_centered.
  • Fill the empty values in the newly created user_ratings_table_centered with zeros.

Esercizio pratico interattivo

Prova a risolvere questo esercizio completando il codice di esempio.

# Get the average rating for each user 
avg_ratings = user_ratings_table.____(axis=____)

# Center each users ratings around 0
user_ratings_table_centered = user_ratings_table.____(____, axis=0)

# Fill in the missing data with 0s
user_ratings_table_normed = user_ratings_table_centered.____(____)
Modifica ed esegui il codice