Creating the TF-IDF DataFrame
Now that you have generated our TF-IDF features, you will need to get them in a format that you can use to make recommendations.
You will once again leverage pandas
for this and wrap the array in a DataFrame.
As you will be using the movie titles to do your filtering of the data, you can assign the titles to the DataFrame's index.
The df_plots
DataFrame has once again been loaded for you. It contains movies' names in the Title
column and their plots in the Plot
column.
This exercise is part of the course
Building Recommendation Engines in Python
Exercise instructions
- Create a
TfidfVectorizer
and fit and transform it as you did in the previous exercise. - Wrap the generated
vectorized_data
in a DataFrame. Use the names of the features generated during the fit and transform phase as its column names and assign your new DataFrame totfidf_df
. - Assign the original movie titles to the index of the newly created
tfidf_df
DataFrame.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
from sklearn.feature_extraction.text import TfidfVectorizer
# Instantiate the vectorizer object and transform the plot column
vectorizer = ____(max_df=0.7, min_df=2)
vectorized_data = vectorizer.____(df_plots['Plot'])
# Create Dataframe from TF-IDFarray
tfidf_df = pd.____(____.toarray(), columns=vectorizer.____())
# Assign the movie titles to the index and inspect
tfidf_df.____ = ____['Title']
print(tfidf_df.head())