Analyse de fréquence des avis produits

Vous avez maintenant accès à un jeu de données plus large d'avis sur les produits TechZone. Comme précédemment, vous avez prétraité les avis et les avez transformés en une représentation BoW X. Votre objectif est maintenant d’analyser les fréquences des mots et d’identifier les termes les plus courants du jeu de données.

Pour vous aider, une fonction utilitaire appelée get_top_ten() est fournie. Elle prend en entrée une liste de mots et leurs occurrences correspondantes, et renvoie les 10 mots les plus fréquents ainsi que leurs comptes.

Cet exercice fait partie du cours

<cours>Natural Language Processing (NLP) in Python</cours>

Exercice interactif pratique

Essayez cet exercice en complétant ce code d’exemple.

def preprocess(text):
    text = text.lower()
    tokens = word_tokenize(text)
    tokens = [word for word in tokens if word not in string.punctuation]
    return " ".join(tokens)
  
cleaned_reviews = [preprocess(review) for review in product_reviews]
X = vectorizer.fit_transform(cleaned_reviews)

# Get word counts
word_counts = np.____(X.____, axis=0)
# Get words
words = vectorizer.____

top_words_with_stopwords, top_counts_with_stopwords = get_top_ten(words, word_counts)
print(top_words_with_stopwords, top_counts_with_stopwords)

Modifier et exécuter le code

Cet exercice fait partie du cours

<cours>Natural Language Processing (NLP) in Python</cours>

IntermédiaireNiveau de compétence

4.9+

Commencer le cours gratuitement

Learn the essentials of text processing in Natural Language Processing (NLP). Master techniques such as tokenization, stop word and punctuation removal, and text normalization with lowercasing, stemming, and lemmatization to prepare text data for further analysis and insight extraction.

Exercise 1: Introduction to natural language processing Exercise 2: Sentence and word tokenization Exercise 3: NLP workflow Exercise 4: Stop words and punctuation handling Exercise 5: Removing stop words Exercise 6: Removing punctuation Exercise 7: Text normalization techniques Exercise 8: Lowercasing Exercise 9: Stemming Exercise 10: Lemmatization

Transform raw text into powerful numerical features. Create Bag-of-Words and TF-IDF representations to capture word importance across documents, then explore word embeddings like Word2Vec and GloVe to uncover deep semantic patterns. Visualize frequency, relevance, and similarity to bring your text data to life.

Exercise 1: Représentation Bag-of-Words Exercise 2: Construire le vocabulaire à partir d’avis clients Exercise 3: Transformer du texte en nombres avec BoW Exercise 4: Analyse de fréquence des avis produits

Exercice actuel

Exercise 5: Visualiser les fréquences des mots Exercise 6: Vectorisation TF-IDF Exercise 7: Représentation TF‑IDF des avis produit Exercise 8: Comparer les représentations BoW et TF-IDF Exercise 9: Embeddings Exercise 10: Explorer les relations entre les mots avec des embeddings Exercise 11: Visualiser et comparer des word embeddings

Harness the power of pre-trained models to perform advanced text classification tasks. Use Hugging Face pipelines for sentiment analysis, topic classification, and natural language inference. Evaluate semantic similarity and grammatical correctness with state-of-the-art models, all without building anything from scratch.

Exercise 1: Hugging Face pipelines for sentiment analysis Exercise 2: Analyzing the sentiment of a review Exercise 3: Batch classifying multiple reviews Exercise 4: Comparing models on labeled review data Exercise 5: Zero-shot classification and QNLI Exercise 6: Zero-shot classification of support tickets Exercise 7: Does the text answer the question?Exercise 8: Question similarity and grammatical correctness Exercise 9: Detecting duplicate questions Exercise 10: Checking grammatical correctness

Dive into the core of modern NLP applications with token classification and text generation techniques. Learn to extract meaningful entities and grammatical structures using NER and PoS tagging. Master both extractive and abstractive question answering, and explore advanced generation tasks including summarization, translation, and language modeling using Hugging Face pipelines.

Exercise 1: Token classification Exercise 2: Identifying named entities in news headlines Exercise 3: Part of Speech tagging for text analysis Exercise 4: Question answering Exercise 5: Answering questions from product descriptions Exercise 6: Generating natural answers with abstractive QA Exercise 7: Sequence generation tasks Exercise 8: Summarizing news articles for quick insights Exercise 9: Translating customer reviews to French Exercise 10: Building a search completion system Exercise 11: Congratulations