Get startedGet started for free

Building vocabulary from customer reviews

You're part of a product analytics team at TechZone, a consumer electronics company. You've received a small batch of customer reviews for a new gadget. To analyze the reviews, you'll first preprocess the text and build a vocabulary, a list of unique words that defines the features used to represent each review as numerical data.

A preprocess() function is pre-loaded for you. It lowercases the text, tokenizes it, and removes punctuation.

This exercise is part of the course

Natural Language Processing (NLP) in Python

View Course

Exercise instructions

  • Preprocess each review in the dataset using the preprocess() function.
  • Fit the vectorizer on the preprocessed reviews.
  • Print the resulting vocabulary.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

reviews = [
    "The product is fantastic! It works like a charm.",
    "I hated the product. It broke after one use.",
    "Product was okay, not the best, but fine overall."
]
# Preprocess the reviews
cleaned_reviews = [____ for ____ in ____]

vectorizer = CountVectorizer()
# Fit the vectorizer
vectorizer.____
# Print the vocabulary 
print(vectorizer.____)
Edit and Run Code