LoslegenKostenlos loslegen

Specify token sequence length with BOW

We saw in the video that by specifying different length of tokens - what we called n-grams - we can better capture the context, which can be very important.

In this exercise, you will work with a sample of the Amazon product reviews. Your task is to build a BOW vocabulary, using the review column and specify the sequence length of tokens.

Diese Übung ist Teil des Kurses

Sentiment Analysis in Python

Kurs anzeigen

Anleitung zur Übung

  • Build the vectorizer, specifying the token sequence length to be uni- and bigrams.
  • Fit the vectorizer.
  • Transform the fitted vectorizer.
  • In the DataFrame, make sure to correctly specify the column names.

Interaktive Übung

Versuche dich an dieser Übung, indem du diesen Beispielcode vervollständigst.

from sklearn.feature_extraction.text import CountVectorizer 

# Build the vectorizer, specify token sequence and fit
vect = ____(____=(___,___))
vect.____(reviews.review)

# Transform the review column
X_review = vect.____(reviews.review)

# Create the bow representation
X_df = pd.DataFrame(X_review.toarray(), columns=vect.____)
print(X_df.head())
Code bearbeiten und ausführen