1. Learn
  2. /
  3. Courses
  4. /
  5. Building Recommendation Engines in Python

Exercise

Limited data in your rows

This data sparsity can cause an issue when using techniques like K-nearest neighbors as discussed in the last chapter. KNN needs to find the k most similar users that have rated an item, but if only less than or equal to k users have given an item the rating, all ratings will be the "most similar".

In this exercise, you will count how often each movie in the user_ratings_df DataFrame has been given a rating, and then see how many have only one or two ratings.

Instructions 1/3

undefined XP
  • 1
    • Count the number of non-empty cells in each column of user_ratings_df and store it as occupied_count.
  • 2
    • Sort occupied_count from low to high. Looking at the resulting sorted Series, note the number of movies with one review.
  • 3
    • Create a histogram of the sorted_occupied_count Series you just created. matplotlib.pyplothas been loaded as plt.