BoW model for movie taglines
In this exercise, you have been provided with a corpus of more than 7000 movie tag lines. Your job is to generate the bag of words representation bow_matrix for these taglines. For this exercise, we will ignore the text preprocessing step and generate bow_matrix directly.
We will also investigate the shape of the resultant bow_matrix. The first five taglines in corpus have been printed to the console for you to examine.
Cet exercice fait partie du cours
Feature Engineering for NLP in Python
Instructions
- Import the
CountVectorizerclass fromsklearn. - Instantiate a
CountVectorizerobject. Name itvectorizer. - Using
fit_transform(), generatebow_matrixforcorpus.
Exercice interactif pratique
Essayez cet exercice en complétant cet exemple de code.
# Import CountVectorizer
from sklearn.feature_extraction.text import ____
# Create CountVectorizer object
____ = ____
# Generate matrix of word vectors
bow_matrix = vectorizer.____(____)
# Print the shape of bow_matrix
print(bow_matrix.shape)