BoW model for movie taglines

In this exercise, you have been provided with a corpus of more than 7000 movie tag lines. Your job is to generate the bag of words representation bow_matrix for these taglines. For this exercise, we will ignore the text preprocessing step and generate bow_matrix directly.

We will also investigate the shape of the resultant bow_matrix. The first five taglines in corpus have been printed to the console for you to examine.

Diese Übung ist Teil des Kurses

<Kurs>Feature Engineering for NLP in Python</Kurs>

Kurs ansehen

Übungsanweisungen

Import the CountVectorizer class from sklearn.
Instantiate a CountVectorizer object. Name it vectorizer.
Using fit_transform(), generate bow_matrix for corpus.

Interaktive praktische Übung

Versuche dich an dieser Übung, indem du diesen Beispielcode vervollständigst.

# Import CountVectorizer
from sklearn.feature_extraction.text import ____

# Create CountVectorizer object
____ = ____

# Generate matrix of word vectors
bow_matrix = vectorizer.____(____)

# Print the shape of bow_matrix
print(bow_matrix.shape)

Code bearbeiten und ausführen