BoW model for movie taglines
In this exercise, you have been provided with a corpus
of more than 7000 movie tag lines. Your job is to generate the bag of words representation bow_matrix
for these taglines. For this exercise, we will ignore the text preprocessing step and generate bow_matrix
directly.
We will also investigate the shape of the resultant bow_matrix
. The first five taglines in corpus
have been printed to the console for you to examine.
Este ejercicio forma parte del curso
Feature Engineering for NLP in Python
Instrucciones del ejercicio
- Import the
CountVectorizer
class fromsklearn
. - Instantiate a
CountVectorizer
object. Name itvectorizer
. - Using
fit_transform()
, generatebow_matrix
forcorpus
.
Ejercicio interactivo práctico
Prueba este ejercicio y completa el código de muestra.
# Import CountVectorizer
from sklearn.feature_extraction.text import ____
# Create CountVectorizer object
____ = ____
# Generate matrix of word vectors
bow_matrix = vectorizer.____(____)
# Print the shape of bow_matrix
print(bow_matrix.shape)