BaşlayınÜcretsiz Başlayın

Frequency analysis of product reviews

You now have access to a larger dataset of TechZone product reviews. Just like before, you've preprocessed and transformed the reviews into a BoW representation X. Your task now is to analyze the word frequencies and identify the most common terms in the dataset.

To help with the analysis, a helper function called get_top_ten() is provided. It takes in a list of words and their corresponding counts, and returns the 10 most frequent words and their counts.

Bu egzersiz

Natural Language Processing (NLP) in Python

kursunun bir parçasıdır
Kursu Görüntüle

Uygulamalı interaktif egzersiz

Bu örnek kodu tamamlayarak bu egzersizi bitirin.

def preprocess(text):
    text = text.lower()
    tokens = word_tokenize(text)
    tokens = [word for word in tokens if word not in string.punctuation]
    return " ".join(tokens)
  
cleaned_reviews = [preprocess(review) for review in product_reviews]
X = vectorizer.fit_transform(cleaned_reviews)

# Get word counts
word_counts = np.____(X.____, axis=0)
# Get words
words = vectorizer.____

top_words_with_stopwords, top_counts_with_stopwords = get_top_ten(words, word_counts)
print(top_words_with_stopwords, top_counts_with_stopwords)
Kodu Düzenle ve Çalıştır