Aan de slagGa gratis aan de slag

Building vocabulary from customer reviews

You're part of a product analytics team at TechZone, a consumer electronics company. You've received a small batch of customer reviews for a new gadget. To analyze the reviews, you'll first preprocess the text and build a vocabulary, a list of unique words that defines the features used to represent each review as numerical data.

A preprocess() function is pre-loaded for you. It lowercases the text, tokenizes it, and removes punctuation.

Deze oefening maakt deel uit van de cursus

Natural Language Processing (NLP) in Python

Cursus bekijken

Oefeninstructies

  • Preprocess each review in the dataset using the preprocess() function.
  • Fit the vectorizer on the preprocessed reviews.
  • Print the resulting vocabulary.

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

reviews = [
    "The product is fantastic! It works like a charm.",
    "I hated the product. It broke after one use.",
    "Product was okay, not the best, but fine overall."
]
# Preprocess the reviews
cleaned_reviews = [____ for ____ in ____]

vectorizer = CountVectorizer()
# Fit the vectorizer
vectorizer.____
# Print the vocabulary 
print(vectorizer.____)
Code bewerken en uitvoeren