BOW using product reviews
You practiced a BOW on a small dataset. Now you will apply it to a sample of Amazon product reviews. The data has been imported for you and is called reviews
. It contains two columns. The first one is called score
and it is 0
when the review is negative, and 1
when it is positive. The second column is called review
and it contains the text of the review that a customer wrote. Feel free to explore the data in the IPython Shell.
Your task is to build a BOW vocabulary, using the review
column.
Remember that we can call the .get_feature_names()
method on the vectorizer to obtain a list of all the vocabulary elements.
This exercise is part of the course
Sentiment Analysis in Python
Exercise instructions
- Create a CountVectorizer object, specifying the maximum number of features.
- Fit the vectorizer.
- Transform the fitted vectorizer.
- Create a DataFrame where you transform the sparse matrix to a dense array and make sure to correctly specify the names of columns.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
from sklearn.feature_extraction.text import CountVectorizer
# Build the vectorizer, specify max features
vect = ____(____=100)
# Fit the vectorizer
vect.____(reviews.review)
# Transform the review column
X_review = vect.____(reviews.review)
# Create the bow representation
X_df=pd.DataFrame(X_review._____, columns=___.____)
print(X_df.head())