Plot recommendation engine
In this exercise, we will build a recommendation engine that suggests movies based on similarity of plot lines. You have been given a get_recommendations()
function that takes in the title of a movie, a similarity matrix and an indices
series as its arguments and outputs a list of most similar movies. indices
has already been provided to you.
You have also been given a movie_plots
Series that contains the plot lines of several movies. Your task is to generate a cosine similarity matrix for the tf-idf vectors of these plots.
Consequently, we will check the potency of our engine by generating recommendations for one of my favorite movies, The Dark Knight Rises.
This exercise is part of the course
Feature Engineering for NLP in Python
Exercise instructions
- Initialize a
TfidfVectorizer
with Englishstop_words
. Name ittfidf
. - Construct
tfidf_matrix
by fitting and transforming the movie plot data usingfit_transform()
. - Generate the cosine similarity matrix
cosine_sim
usingtfidf_matrix
. Don't usecosine_similarity()
! - Use
get_recommendations()
to generate recommendations for'The Dark Knight Rises'
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Initialize the TfidfVectorizer
tfidf = ____(____='english')
# Construct the TF-IDF matrix
tfidf_matrix = tfidf.____(movie_plots)
# Generate the cosine similarity matrix
cosine_sim = ____(tfidf_matrix, tfidf_matrix)
# Generate recommendations
print(get_recommendations(____, cosine_sim, indices))