Lowercasing
You're analyzing user reviews for a travel website. These reviews often include inconsistent capitalization like "TRAVEL"
and "travel"
. To prepare the text for sentiment analysis and topic extraction, you'll first convert all words to lowercase, then tokenize them and clean them from stop words and punctuation.
The word_tokenize()
function, a stop_words
list have been provided. NLTK resources are already downloaded.
This exercise is part of the course
Natural Language Processing (NLP) in Python
Exercise instructions
- Convert the provided
review
into lowercase. - Tokenize the
lower_text
into words. - Use list comprehension to remove stop words and punctuation using the lists of
stop_words
andstring.punctuation
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
review = "I have been FLYING a lot lately and the Flights just keep getting DELAYED. Honestly, traveling for WORK gets exhausting with endless delays, but every trip teaches you something new!"
# Lowercase the review
lower_text = ____
# Tokenize the lower_text into words
tokens = ____
# Remove stop words and punctuation
clean_tokens = [____ if word ____ and word ____]
print(clean_tokens)