Session Ready
Exercise

Comparing all your movies with TF-IDF

Now that you have put in the hard work of getting your TF-IDF data into a usable format, it's time to put it to work generating finding similarities and generating recommendations.

This time as you are using TF-IDF scores (which are floats as opposed to Booleans) you will use the cosine similarity metric to find the similarities between items. In this exercise, you will generate a matrix of all of the movie cosine similarities and store them in a DataFrame for ease of lookup. This will allow you to compare movies and find recommendations quickly and easily.

The tfidf_df DataFrame you created in the last exercise containing a row for each movie has been loaded for you.

Instructions
100 XP
  • Find the cosine similarity measures between all movies and assign the results to cosine_similarity_array.
  • Create a DataFrame from the cosine_similarity_array with tfidf_summary_df.index as its rows and columns.
  • Print the top five rows of the DataFrame and examine the similarity scores.