Lowercasing
You're analyzing user reviews for a travel website. These reviews often include inconsistent capitalization like "TRAVEL" and "travel". To prepare the text for sentiment analysis and topic extraction, you'll first convert all words to lowercase, then tokenize them and clean them from stop words and punctuation.
The word_tokenize() function, a stop_words list have been provided. NLTK resources are already downloaded.
This exercise is part of the course
Natural Language Processing (NLP) in Python
Exercise instructions
- Convert the provided
reviewinto lowercase. - Tokenize the
lower_textinto words. - Use list comprehension to remove stop words and punctuation using the lists of
stop_wordsandstring.punctuation.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
review = "I have been FLYING a lot lately and the Flights just keep getting DELAYED. Honestly, traveling for WORK gets exhausting with endless delays, but every trip teaches you something new!"
# Lowercase the review
lower_text = ____
# Tokenize the lower_text into words
tokens = ____
# Remove stop words and punctuation
clean_tokens = [____ if word ____ and word ____]
print(clean_tokens)