Building vocabulary from customer reviews
You're part of a product analytics team at TechZone, a consumer electronics company. You've received a small batch of customer reviews for a new gadget. To analyze the reviews, you'll first preprocess the text and build a vocabulary, a list of unique words that defines the features used to represent each review as numerical data.
A preprocess() function is pre-loaded for you. It lowercases the text, tokenizes it, and removes punctuation.
Questo esercizio fa parte del corso
Natural Language Processing (NLP) in Python
Istruzioni dell'esercizio
- Preprocess each review in the dataset using the
preprocess()function. - Fit the
vectorizeron the preprocessed reviews. - Print the resulting vocabulary.
Esercizio pratico interattivo
Prova a risolvere questo esercizio completando il codice di esempio.
reviews = [
"The product is fantastic! It works like a charm.",
"I hated the product. It broke after one use.",
"Product was okay, not the best, but fine overall."
]
# Preprocess the reviews
cleaned_reviews = [____ for ____ in ____]
vectorizer = CountVectorizer()
# Fit the vectorizer
vectorizer.____
# Print the vocabulary
print(vectorizer.____)