Get startedGet started for free

Comparing all your movies with TF-IDF

Now that you have put in the hard work of getting your TF-IDF data into a usable format, it's time to put it to work generating finding similarities and generating recommendations.

This time as you are using TF-IDF scores (which are floats as opposed to Booleans) you will use the cosine similarity metric to find the similarities between items. In this exercise, you will generate a matrix of all of the movie cosine similarities and store them in a DataFrame for ease of lookup. This will allow you to compare movies and find recommendations quickly and easily.

The tfidf_df DataFrame you created in the last exercise containing a row for each movie has been loaded for you.

This exercise is part of the course

Building Recommendation Engines in Python

View Course

Exercise instructions

  • Find the cosine similarity measures between all movies and assign the results to cosine_similarity_array.
  • Create a DataFrame from the cosine_similarity_array with tfidf_summary_df.index as its rows and columns.
  • Print the top five rows of the DataFrame and examine the similarity scores.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Import cosine_similarity measure
from sklearn.metrics.pairwise import ____

# Create the array of cosine similarity values
cosine_similarity_array = ____(tfidf_summary_df)

# Wrap the array in a pandas DataFrame
cosine_similarity_df = pd.____(cosine_similarity_array, ____=____.____, ____=____.____)

# Print the top 5 rows of the DataFrame
print(cosine_similarity_df.head())
Edit and Run Code