Specify the token pattern
In this exercise, you will work with the text
column of the tweets
dataset. Your task is to vectorize the object column using CountVectorizer
. You will apply different patterns of tokens in the vectorizer. Remember that by specifying the token pattern, you can filter out characters.
The CountVectorizer
has been imported for you.
Diese Übung ist Teil des Kurses
Sentiment Analysis in Python
Interaktive Übung
Versuche dich an dieser Übung, indem du diesen Beispielcode vervollständigst.
# Build and fit the vectorizer
vect = ____(____=r'\b[^\d\W][^\d\W]+\b').fit(tweets.text)
vect.transform(tweets.text)
print('Length of vectorizer: ', len(vect.get_feature_names()))