BoW model for movie taglines
In this exercise, you have been provided with a corpus
of more than 7000 movie tag lines. Your job is to generate the bag of words representation bow_matrix
for these taglines. For this exercise, we will ignore the text preprocessing step and generate bow_matrix
directly.
We will also investigate the shape of the resultant bow_matrix
. The first five taglines in corpus
have been printed to the console for you to examine.
Cet exercice fait partie du cours
Feature Engineering for NLP in Python
Instructions
- Import the
CountVectorizer
class fromsklearn
. - Instantiate a
CountVectorizer
object. Name itvectorizer
. - Using
fit_transform()
, generatebow_matrix
forcorpus
.
Exercice interactif pratique
Essayez cet exercice en complétant cet exemple de code.
# Import CountVectorizer
from sklearn.feature_extraction.text import ____
# Create CountVectorizer object
____ = ____
# Generate matrix of word vectors
bow_matrix = vectorizer.____(____)
# Print the shape of bow_matrix
print(bow_matrix.shape)