BoW vectors for movie reviews
In this exercise, you have been given two pandas Series, X_train
and X_test
, which consist of movie reviews. They represent the training and the test review data respectively. Your task is to preprocess the reviews and generate BoW vectors for these two sets using CountVectorizer
.
Once we have generated the BoW vector matrices X_train_bow
and X_test_bow
, we will be in a very good position to apply a machine learning model to it and conduct sentiment analysis.
Este ejercicio forma parte del curso
Feature Engineering for NLP in Python
Instrucciones del ejercicio
- Import
CountVectorizer
from thesklearn
library. - Instantiate a
CountVectorizer
object namedvectorizer
. Ensure that all words are converted to lowercase andenglish
stopwords are removed. - Using
X_train
, fitvectorizer
and then use it to transformX_train
to generate the set of BoW vectorsX_train_bow
. - Transform
X_test
usingvectorizer
to generate the set of BoW vectorsX_test_bow
.
Ejercicio interactivo práctico
Prueba este ejercicio y completa el código de muestra.
# Import CountVectorizer
from sklearn.feature_extraction.text import ____
# Create a CountVectorizer object
vectorizer = ____(lowercase=____, stop_words=____)
# Fit and transform X_train
X_train_bow = vectorizer.____(____)
# Transform X_test
X_test_bow = vectorizer.____(____)
# Print shape of X_train_bow and X_test_bow
print(X_train_bow.shape)
print(X_test_bow.shape)