Removing stop words
You're working on a project where the goal is to classify feedback from users into different categories like "product issues", "service issues", and "suggestions". Often, stop words don't carry much meaning in distinguishing between categories. Your task is to remove these stop words to focus on the important words that will help a machine later on categorize the feedback into the correct topics.
The functions word_tokenize
from nltk.tokenize
and stopwords.words
from nltk.corpus
have been imported for you. Additionally, the NLTK resources punkt_tab
and stopwords
have already been downloaded.
This exercise is part of the course
Natural Language Processing (NLP) in Python
Exercise instructions
- Tokenize the provided feedback into words.
- Get the list of English stopwords.
- Remove English stop words and save the result in
filtered_tokens
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
feedback = "I reached out to support and got a helpful response within minutes!!! Very #impressed"
# Tokenize the text
tokens = word_tokenize(____)
# Get the list of English stop words
stop_words = stopwords.____('____')
# Remove stop words
filtered_tokens = [____ for word in tokens if ____.lower() not in ____]
print(filtered_tokens)