tf-idf vectors for TED talks
In this exercise, you have been given a corpus ted
which contains the transcripts of 500 TED Talks. Your task is to generate the tf-idf vectors for these talks.
In a later lesson, we will use these vectors to generate recommendations of similar talks based on the transcript.
Cet exercice fait partie du cours
Feature Engineering for NLP in Python
Instructions
- Import
TfidfVectorizer
fromsklearn
. - Create a
TfidfVectorizer
object. Name itvectorizer
. - Generate
tfidf_matrix
forted
using thefit_transform()
method.
Exercice interactif pratique
Essayez cet exercice en complétant cet exemple de code.
# Import TfidfVectorizer
from ____ import ____
# Create TfidfVectorizer object
____
# Generate matrix of word vectors
tfidf_matrix = vectorizer.____(____)
# Print the shape of tfidf_matrix
print(tfidf_matrix.shape)