1. Learn
  2. /
  3. Courses
  4. /
  5. Building Recommendation Engines in Python

Connected

Exercise

Comparing all your movies at once

While finding the Jaccard similarity between any two individual movies in your dataset is great for small-scale analyses, it can prove slow on larger datasets to make recommendations.

In this exercise, you will find the similarities between all movies and store them in a DataFrame for quick and easy lookup.

When finding the similarities between the rows in a DataFrame, you could run through all pairs and calculate them individually, but it's more efficient to use the pdist() (pairwise distance) function from scipy.

This can be reshaped into the desired rectangular shape using squareform() from the same library. Since you want similarity values as opposed to distances, you should subtract the values from 1.

movie_cross_table has once again been loaded for you.

Instructions

100 XP
  • Find the Jaccard distance measures between all movies and assign the results to jaccard_similarity_array.
  • Create a DataFrame from the jaccard_similarity_array with movie_genre_df.index as its rows and columns.
  • Print the top 5 rows of the DataFrame and examine the similarity scores.