Session Ready
Exercise

Step 2: Building a vectorizer

In this exercise, you are asked to build a TfIDf transformation of the review column in the reviews dataset. You are asked to specify the n-grams, stop words, the pattern of tokens and the size of the vocabulary arguments.

This is the last step before we train a classifier to predict the sentiment of a review.

Instructions
100 XP
  • Import the Tfidf vectorizer and the default list of English stop words.
  • Build the Tfidf vectorizer, specifying - in this order - the following arguments: use as stop words the default list of English stop words; as n-grams use uni- and bi-grams;the maximum number of features should be 200; capture only words using the specified pattern.
  • Create a DataFrame using the Tfidf vectorizer.