ComenzarEmpieza gratis

Tfidf and a BOW on same data

In this exercise, you will transform the review column of the Amazon product reviews using both a bag-of-words and a tfidf transformation.

Build both vectorizers, specifying only the maximum number of features to be equal to 100. Create DataFrames after the transformation and print the top 5 rows of each.

Be careful how you specify the maximum number of features in the vocabulary. A large vocabulary size can result in your session being disconnected.

Este ejercicio forma parte del curso

Sentiment Analysis in Python

Ver curso

Instrucciones del ejercicio

  • Import the BOW and Tfidf vectorizers.
  • Build and fit a BOW and a Tfidf vectorizer from the review column and limit the number of created features to 100.
  • Create DataFrames from the transformed vector representations.

Ejercicio interactivo práctico

Prueba este ejercicio completando el código de muestra.

# Import the required packages
____

# Build a BOW and tfidf vectorizers from the review column and with max of 100 features
vect1 = ____(____=100).____(____.____)
vect2 = ____(____=100).____(____.____) 

# Transform the vectorizers
X1 = vect1.transform(reviews.review)
X2 = vect2.transform(reviews.review)
# Create DataFrames from the vectorizers 
X_df1 = pd.DataFrame(X1.____, columns=____.____)
X_df2 = pd.DataFrame(X2.____, columns=____.____)
print('Top 5 rows using BOW: \n', X_df1.head())
print('Top 5 rows using tfidf: \n', X_df2.head())
Editar y ejecutar código