Specify the token pattern
In this exercise, you will work with the text column of the tweets dataset. Your task is to vectorize the object column using CountVectorizer. You will apply different patterns of tokens in the vectorizer. Remember that by specifying the token pattern, you can filter out characters.
The CountVectorizer has been imported for you.
Latihan ini adalah bagian dari kursus
Sentiment Analysis in Python
Latihan interaktif praktis
Cobalah latihan ini dengan menyelesaikan kode contoh berikut.
# Build and fit the vectorizer
vect = ____(____=r'\b[^\d\W][^\d\W]+\b').fit(tweets.text)
vect.transform(tweets.text)
print('Length of vectorizer: ', len(vect.get_feature_names()))