Get startedGet started for free

Matrix sparsity

A common challenge with real-world ratings data is that most users will not have rated most items, and most items will only have been rated by a small number of users. This results in a very empty or sparse DataFrame.

In this exercise, you will calculate how sparse the movie_lens ratings data is by counting the number of occupied cells and compare it to the size of the full DataFrame. The DataFrame user_ratings_df that you have used in previous exercises, containing a row per user and a column per movie, has been loaded for you.

This exercise is part of the course

Building Recommendation Engines in Python

View Course

Exercise instructions

  • Count the number of non-empty cells in user_ratings_df and store the result as sparsity_count.
  • Count the total number of cells in the user_ratings_df DataFrame and store it as full_count.
  • Calculate the sparsity of the DataFrame by dividing the number of non-empty cells by the total number of cells and print the result.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Count the occupied cells
sparsity_count = user_ratings_df.____().____.____()

# Count all cells
full_count = user_ratings_df.____

# Find the sparsity of the DataFrame
sparsity = ____ / ____
print(sparsity)
Edit and Run Code