Your first BOW
A bag-of-words is an approach to transform text to numeric form.
In this exercise, you will apply a BOW to the annak
list before moving on to a larger dataset in the next exercise.
Your task will be to work with this list and apply a BOW using the CountVectorizer()
. This transformation is your first step in being able to understand the sentiment of a text. Pay attention to words which might carry a strong sentiment.
Remember that the output of a CountVectorizer()
is a sparse matrix, which stores only entries which are non-zero. To look at the actual content of this matrix, we convert it to a dense array using the .toarray()
method.
Note that in this case you don't need to specify the max_features
argument because the text is short.
This exercise is part of the course
Sentiment Analysis in Python
Exercise instructions
- Import the count vectorizer function from
sklearn.feature_extraction.text
. - Build and fit the vectorizer on the small dataset.
- Create the BOW representation with name
anna_bow
by calling thetransform()
method. - Print the BOW result as a dense array.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Import the required function
____
annak = ['Happy families are all alike;', 'every unhappy family is unhappy in its own way']
# Build the vectorizer and fit it
anna_vect = ____
____.____(annak)
# Create the bow representation
anna_bow = anna_vect.____(annak)
# Print the bag-of-words result
print(anna_bow.toarray())