Comparing all your movies with TF-IDF
Now that you have put in the hard work of getting your TF-IDF data into a usable format, it's time to put it to work generating finding similarities and generating recommendations.
This time as you are using TF-IDF scores (which are floats as opposed to Booleans) you will use the cosine similarity metric to find the similarities between items. In this exercise, you will generate a matrix of all of the movie cosine similarities and store them in a DataFrame for ease of lookup. This will allow you to compare movies and find recommendations quickly and easily.
The tfidf_df
DataFrame you created in the last exercise containing a row for each movie has been loaded for you.
This exercise is part of the course
Building Recommendation Engines in Python
Exercise instructions
- Find the cosine similarity measures between all movies and assign the results to
cosine_similarity_array
. - Create a DataFrame from the
cosine_similarity_array
withtfidf_summary_df.index
as its rows and columns. - Print the top five rows of the DataFrame and examine the similarity scores.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Import cosine_similarity measure
from sklearn.metrics.pairwise import ____
# Create the array of cosine similarity values
cosine_similarity_array = ____(tfidf_summary_df)
# Wrap the array in a pandas DataFrame
cosine_similarity_df = pd.____(cosine_similarity_array, ____=____.____, ____=____.____)
# Print the top 5 rows of the DataFrame
print(cosine_similarity_df.head())