Tfidf and a BOW on same data
In this exercise, you will transform the review
column of the Amazon product reviews
using both a bag-of-words and a tfidf transformation.
Build both vectorizers, specifying only the maximum number of features to be equal to 100
. Create DataFrames after the transformation and print the top 5 rows of each.
Be careful how you specify the maximum number of features in the vocabulary. A large vocabulary size can result in your session being disconnected.
This exercise is part of the course
Sentiment Analysis in Python
Exercise instructions
- Import the BOW and Tfidf vectorizers.
- Build and fit a BOW and a Tfidf vectorizer from the
review
column and limit the number of created features to 100. - Create DataFrames from the transformed vector representations.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Import the required packages
____
# Build a BOW and tfidf vectorizers from the review column and with max of 100 features
vect1 = ____(____=100).____(____.____)
vect2 = ____(____=100).____(____.____)
# Transform the vectorizers
X1 = vect1.transform(reviews.review)
X2 = vect2.transform(reviews.review)
# Create DataFrames from the vectorizers
X_df1 = pd.DataFrame(X1.____, columns=____.____)
X_df2 = pd.DataFrame(X2.____, columns=____.____)
print('Top 5 rows using BOW: \n', X_df1.head())
print('Top 5 rows using tfidf: \n', X_df2.head())