BoW vectors for movie reviews
In this exercise, you have been given two pandas Series, X_train and X_test, which consist of movie reviews. They represent the training and the test review data respectively. Your task is to preprocess the reviews and generate BoW vectors for these two sets using CountVectorizer.
Once we have generated the BoW vector matrices X_train_bow and X_test_bow, we will be in a very good position to apply a machine learning model to it and conduct sentiment analysis.
Este exercício faz parte do curso
Feature Engineering for NLP in Python
Instruções do exercício
- Import
CountVectorizerfrom thesklearnlibrary. - Instantiate a
CountVectorizerobject namedvectorizer. Ensure that all words are converted to lowercase andenglishstopwords are removed. - Using
X_train, fitvectorizerand then use it to transformX_trainto generate the set of BoW vectorsX_train_bow. - Transform
X_testusingvectorizerto generate the set of BoW vectorsX_test_bow.
Exercício interativo prático
Experimente este exercício completando este código de exemplo.
# Import CountVectorizer
from sklearn.feature_extraction.text import ____
# Create a CountVectorizer object
vectorizer = ____(lowercase=____, stop_words=____)
# Fit and transform X_train
X_train_bow = vectorizer.____(____)
# Transform X_test
X_test_bow = vectorizer.____(____)
# Print shape of X_train_bow and X_test_bow
print(X_train_bow.shape)
print(X_test_bow.shape)