Step 2: Building a vectorizer

In this exercise, you are asked to build a TfIDf transformation of the review column in the reviews dataset. You are asked to specify the n-grams, stop words, the pattern of tokens and the size of the vocabulary arguments.

This is the last step before we train a classifier to predict the sentiment of a review.

Make sure you specify the maximum number of features properly, as a very large vocabulary size could disconnect your session.

Import the Tfidf vectorizer and the default list of English stop words.
Build the Tfidf vectorizer, specifying - in this order - the following arguments: use as stop words the default list of English stop words; as n-grams use uni- and bi-grams;the maximum number of features should be 200; capture only words using the specified pattern.
Create a DataFrame using the Tfidf vectorizer.

Sentiment Analysis Nuts and Bolts

Numeric Features from Reviews

More on Numeric Vectors: Transforming Tweets

Let's Predict the Sentiment

Exercise

Step 2: Building a vectorizer

Instructions