LoslegenKostenlos loslegen

Comparing all your movies with TF-IDF

Now that you have put in the hard work of getting your TF-IDF data into a usable format, it's time to put it to work generating finding similarities and generating recommendations.

This time as you are using TF-IDF scores (which are floats as opposed to Booleans) you will use the cosine similarity metric to find the similarities between items. In this exercise, you will generate a matrix of all of the movie cosine similarities and store them in a DataFrame for ease of lookup. This will allow you to compare movies and find recommendations quickly and easily.

The tfidf_df DataFrame you created in the last exercise containing a row for each movie has been loaded for you.

Diese Übung ist Teil des Kurses

Building Recommendation Engines in Python

Kurs anzeigen

Anleitung zur Übung

  • Find the cosine similarity measures between all movies and assign the results to cosine_similarity_array.
  • Create a DataFrame from the cosine_similarity_array with tfidf_summary_df.index as its rows and columns.
  • Print the top five rows of the DataFrame and examine the similarity scores.

Interaktive Übung

Vervollständige den Beispielcode, um diese Übung erfolgreich abzuschließen.

# Import cosine_similarity measure
from sklearn.metrics.pairwise import ____

# Create the array of cosine similarity values
cosine_similarity_array = ____(tfidf_summary_df)

# Wrap the array in a pandas DataFrame
cosine_similarity_df = pd.____(cosine_similarity_array, ____=____.____, ____=____.____)

# Print the top 5 rows of the DataFrame
print(cosine_similarity_df.head())
Code bearbeiten und ausführen