BOW using product reviews

You practiced a BOW on a small dataset. Now you will apply it to a sample of Amazon product reviews. The data has been imported for you and is called reviews. It contains two columns. The first one is called score and it is 0 when the review is negative, and 1 when it is positive. The second column is called review and it contains the text of the review that a customer wrote. Feel free to explore the data in the IPython Shell.

Your task is to build a BOW vocabulary, using the review column.

Remember that we can call the .get_feature_names() method on the vectorizer to obtain a list of all the vocabulary elements.

Create a CountVectorizer object, specifying the maximum number of features.
Fit the vectorizer.
Transform the fitted vectorizer.
Create a DataFrame where you transform the sparse matrix to a dense array and make sure to correctly specify the names of columns.

Sentiment Analysis Nuts and Bolts

Numeric Features from Reviews

More on Numeric Vectors: Transforming Tweets

Let's Predict the Sentiment

Exercise

BOW using product reviews

Instructions