CommencerCommencer gratuitement

Comparing all your movies with TF-IDF

Now that you have put in the hard work of getting your TF-IDF data into a usable format, it's time to put it to work generating finding similarities and generating recommendations.

This time as you are using TF-IDF scores (which are floats as opposed to Booleans) you will use the cosine similarity metric to find the similarities between items. In this exercise, you will generate a matrix of all of the movie cosine similarities and store them in a DataFrame for ease of lookup. This will allow you to compare movies and find recommendations quickly and easily.

The tfidf_df DataFrame you created in the last exercise containing a row for each movie has been loaded for you.

Cet exercice fait partie du cours

Building Recommendation Engines in Python

Afficher le cours

Instructions

  • Find the cosine similarity measures between all movies and assign the results to cosine_similarity_array.
  • Create a DataFrame from the cosine_similarity_array with tfidf_summary_df.index as its rows and columns.
  • Print the top five rows of the DataFrame and examine the similarity scores.

Exercice interactif pratique

Essayez cet exercice en complétant cet exemple de code.

# Import cosine_similarity measure
from sklearn.metrics.pairwise import ____

# Create the array of cosine similarity values
cosine_similarity_array = ____(tfidf_summary_df)

# Wrap the array in a pandas DataFrame
cosine_similarity_df = pd.____(cosine_similarity_array, ____=____.____, ____=____.____)

# Print the top 5 rows of the DataFrame
print(cosine_similarity_df.head())
Modifier et exécuter le code