ComenzarEmpieza gratis

Specify token sequence length with BOW

We saw in the video that by specifying different length of tokens - what we called n-grams - we can better capture the context, which can be very important.

In this exercise, you will work with a sample of the Amazon product reviews. Your task is to build a BOW vocabulary, using the review column and specify the sequence length of tokens.

Este ejercicio forma parte del curso

Sentiment Analysis in Python

Ver curso

Instrucciones del ejercicio

  • Build the vectorizer, specifying the token sequence length to be uni- and bigrams.
  • Fit the vectorizer.
  • Transform the fitted vectorizer.
  • In the DataFrame, make sure to correctly specify the column names.

Ejercicio interactivo práctico

Prueba este ejercicio completando el código de muestra.

from sklearn.feature_extraction.text import CountVectorizer 

# Build the vectorizer, specify token sequence and fit
vect = ____(____=(___,___))
vect.____(reviews.review)

# Transform the review column
X_review = vect.____(reviews.review)

# Create the bow representation
X_df = pd.DataFrame(X_review.toarray(), columns=vect.____)
print(X_df.head())
Editar y ejecutar código