1. Learn
  2. /
  3. Courses
  4. /
  5. Feature Engineering for NLP in Python

Exercise

BoW model for movie taglines

In this exercise, you have been provided with a corpus of more than 7000 movie tag lines. Your job is to generate the bag of words representation bow_matrix for these taglines. For this exercise, we will ignore the text preprocessing step and generate bow_matrix directly.

We will also investigate the shape of the resultant bow_matrix. The first five taglines in corpus have been printed to the console for you to examine.

Instructions

100 XP
  • Import the CountVectorizer class from sklearn.
  • Instantiate a CountVectorizer object. Name it vectorizer.
  • Using fit_transform(), generate bow_matrix for corpus.