Get startedGet started for free

Specify token sequence length with BOW

We saw in the video that by specifying different length of tokens - what we called n-grams - we can better capture the context, which can be very important.

In this exercise, you will work with a sample of the Amazon product reviews. Your task is to build a BOW vocabulary, using the review column and specify the sequence length of tokens.

This exercise is part of the course

Sentiment Analysis in Python

View Course

Exercise instructions

  • Build the vectorizer, specifying the token sequence length to be uni- and bigrams.
  • Fit the vectorizer.
  • Transform the fitted vectorizer.
  • In the DataFrame, make sure to correctly specify the column names.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

from sklearn.feature_extraction.text import CountVectorizer 

# Build the vectorizer, specify token sequence and fit
vect = ____(____=(___,___))
vect.____(reviews.review)

# Transform the review column
X_review = vect.____(reviews.review)

# Create the bow representation
X_df = pd.DataFrame(X_review.toarray(), columns=vect.____)
print(X_df.head())
Edit and Run Code