Kleinschreibung

Du analysierst Nutzerbewertungen für eine Reise-Website. Diese Reviews enthalten oft uneinheitliche Groß-/Kleinschreibung wie "TRAVEL" und "travel". Um den Text für Stimmungsanalyse und Themauswertung aufzubereiten, wandelst du zuerst alle Wörter in Kleinbuchstaben um, tokenisierst sie und entfernst anschließend Stoppwörter und Satzzeichen.

Die Funktion word_tokenize() und eine Liste stop_words sind bereitgestellt. NLTK-Ressourcen sind bereits heruntergeladen.

Diese Übung ist Teil des Kurses

<Kurs>Natural Language Processing (NLP) in Python</Kurs>

Übungsanweisungen

Wandle die bereitgestellte review in Kleinbuchstaben um.
Tokenisiere lower_text in Wörter.
Verwende eine List Comprehension, um Stoppwörter und Satzzeichen mit den Listen stop_words und string.punctuation zu entfernen.

Interaktive praktische Übung

Versuche dich an dieser Übung, indem du diesen Beispielcode vervollständigst.

review = "I have been FLYING a lot lately and the Flights just keep getting DELAYED. Honestly, traveling for WORK gets exhausting with endless delays, but every trip teaches you something new!"

# Lowercase the review
lower_text = ____

# Tokenize the lower_text into words
tokens = ____

# Remove stop words and punctuation
clean_tokens = [____ if word ____ and word ____]

print(clean_tokens)

Code bearbeiten und ausführen

Diese Übung ist Teil des Kurses

<Kurs>Natural Language Processing (NLP) in Python</Kurs>

Mittlere SchwierigkeitSchwierigkeitsgrad

4.9+

Kurs kostenlos starten

Learn the essentials of text processing in Natural Language Processing (NLP). Master techniques such as tokenization, stop word and punctuation removal, and text normalization with lowercasing, stemming, and lemmatization to prepare text data for further analysis and insight extraction.

Exercise 1: Einführung in die Verarbeitung natürlicher Sprache Exercise 2: Satz- und Wort-Tokenisierung Exercise 3: NLP-Workflow Exercise 4: Umgang mit Stoppwörtern und Satzzeichen Exercise 5: Stoppwörter entfernen Exercise 6: Satzzeichen entfernen Exercise 7: Techniken der Textnormalisierung Exercise 8: Kleinschreibung

Aktuelle Übung

Exercise 9: Stemming Exercise 10: Lemmatisierung

Transform raw text into powerful numerical features. Create Bag-of-Words and TF-IDF representations to capture word importance across documents, then explore word embeddings like Word2Vec and GloVe to uncover deep semantic patterns. Visualize frequency, relevance, and similarity to bring your text data to life.

Exercise 1: Bag-of-Words representation Exercise 2: Building vocabulary from customer reviews Exercise 3: Transforming text to numbers with BoW Exercise 4: Frequency analysis of product reviews Exercise 5: Visualizing word frequencies Exercise 6: TF-IDF vectorization Exercise 7: TF-IDF representation of product feedback Exercise 8: Comparing BoW and TF-IDF representations Exercise 9: Embeddings Exercise 10: Exploring word relationships with embeddings Exercise 11: Visualizing and comparing word embeddings

Harness the power of pre-trained models to perform advanced text classification tasks. Use Hugging Face pipelines for sentiment analysis, topic classification, and natural language inference. Evaluate semantic similarity and grammatical correctness with state-of-the-art models, all without building anything from scratch.

Exercise 1: Hugging Face pipelines for sentiment analysis Exercise 2: Analyzing the sentiment of a review Exercise 3: Batch classifying multiple reviews Exercise 4: Comparing models on labeled review data Exercise 5: Zero-shot classification and QNLI Exercise 6: Zero-shot classification of support tickets Exercise 7: Does the text answer the question?Exercise 8: Question similarity and grammatical correctness Exercise 9: Detecting duplicate questions Exercise 10: Checking grammatical correctness

Dive into the core of modern NLP applications with token classification and text generation techniques. Learn to extract meaningful entities and grammatical structures using NER and PoS tagging. Master both extractive and abstractive question answering, and explore advanced generation tasks including summarization, translation, and language modeling using Hugging Face pipelines.

Exercise 1: Token classification Exercise 2: Identifying named entities in news headlines Exercise 3: Part of Speech tagging for text analysis Exercise 4: Question answering Exercise 5: Answering questions from product descriptions Exercise 6: Generating natural answers with abstractive QA Exercise 7: Sequence generation tasks Exercise 8: Summarizing news articles for quick insights Exercise 9: Translating customer reviews to French Exercise 10: Building a search completion system Exercise 11: Congratulations