Get startedGet started for free

Creating the TF-IDF DataFrame

Now that you have generated our TF-IDF features, you will need to get them in a format that you can use to make recommendations. You will once again leverage pandas for this and wrap the array in a DataFrame. As you will be using the movie titles to do your filtering of the data, you can assign the titles to the DataFrame's index.

The df_plots DataFrame has once again been loaded for you. It contains movies' names in the Title column and their plots in the Plot column.

This exercise is part of the course

Building Recommendation Engines in Python

View Course

Exercise instructions

  • Create a TfidfVectorizer and fit and transform it as you did in the previous exercise.
  • Wrap the generated vectorized_data in a DataFrame. Use the names of the features generated during the fit and transform phase as its column names and assign your new DataFrame to tfidf_df.
  • Assign the original movie titles to the index of the newly created tfidf_df DataFrame.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

from sklearn.feature_extraction.text import TfidfVectorizer

# Instantiate the vectorizer object and transform the plot column
vectorizer = ____(max_df=0.7, min_df=2)
vectorized_data = vectorizer.____(df_plots['Plot']) 

# Create Dataframe from TF-IDFarray
tfidf_df = pd.____(____.toarray(), columns=vectorizer.____())

# Assign the movie titles to the index and inspect
tfidf_df.____ = ____['Title']
print(tfidf_df.head())
Edit and Run Code