Building vocabulary from customer reviews
You're part of a product analytics team at TechZone, a consumer electronics company. You've received a small batch of customer reviews for a new gadget. To analyze the reviews, you'll first preprocess the text and build a vocabulary, a list of unique words that defines the features used to represent each review as numerical data.
A preprocess()
function is pre-loaded for you. It lowercases the text, tokenizes it, and removes punctuation.
This exercise is part of the course
Natural Language Processing (NLP) in Python
Exercise instructions
- Preprocess each review in the dataset using the
preprocess()
function. - Fit the
vectorizer
on the preprocessed reviews. - Print the resulting vocabulary.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
reviews = [
"The product is fantastic! It works like a charm.",
"I hated the product. It broke after one use.",
"Product was okay, not the best, but fine overall."
]
# Preprocess the reviews
cleaned_reviews = [____ for ____ in ____]
vectorizer = CountVectorizer()
# Fit the vectorizer
vectorizer.____
# Print the vocabulary
print(vectorizer.____)