IniziaInizia gratis

TF-IDF representation of product feedback

You're working with a customer support team at a smart home company. They've collected user feedback on a range of smart devices and want to identify which words stand out in each review. You suggest using the TF-IDF technique to highlight the most relevant terms across feedback entries. Let's help them get started!

A preprocess() function that receives a text and returns a processed one is pre-loaded for you. This function applies lowercasing, tokenization, and punctuation removal. Pandas has been imported as pd, and the TfidfVectorizer class is ready to use.

Questo esercizio fa parte del corso

Natural Language Processing (NLP) in Python

Visualizza il corso

Istruzioni dell'esercizio

  • Initialize a TF-IDF vectorizer.
  • Transform the cleaned reviews into a tfidf_matrix.
  • Create a DataFrame df for the tfidf_matrix, having the vocabulary words as columns.

Esercizio pratico interattivo

Prova a risolvere questo esercizio completando il codice di esempio.

reviews = ["The smart speaker is incredible. Clear sound and fast responses!",
           "I am disappointed with the smart bulb. It stopped working in a week.",
           "The thermostat is okay. Not too smart, but functional."]
cleaned_reviews = [preprocess(review) for review in reviews]

# Initialize the vectorizer
vectorizer = ____
# Transform the cleaned reviews
tfidf_matrix = ____
# Create a DataFrame for TF-IDF
df = pd.DataFrame(
  tfidf_matrix.toarray(),
  columns=vectorizer.____
)
print(df.head())
Modifica ed esegui il codice