Matrix sparsity
A common challenge with real-world ratings data is that most users will not have rated most items, and most items will only have been rated by a small number of users. This results in a very empty or sparse DataFrame.
In this exercise, you will calculate how sparse the movie_lens
ratings data is by counting the number of occupied cells and compare it to the size of the full DataFrame.
The DataFrame user_ratings_df
that you have used in previous exercises, containing a row per user and a column per movie, has been loaded for you.
This exercise is part of the course
Building Recommendation Engines in Python
Exercise instructions
- Count the number of non-empty cells in
user_ratings_df
and store the result assparsity_count
. - Count the total number of cells in the
user_ratings_df
DataFrame and store it asfull_count
. - Calculate the sparsity of the DataFrame by dividing the number of non-empty cells by the total number of cells and print the result.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Count the occupied cells
sparsity_count = user_ratings_df.____().____.____()
# Count all cells
full_count = user_ratings_df.____
# Find the sparsity of the DataFrame
sparsity = ____ / ____
print(sparsity)