Get startedGet started for free

Lowercasing

You're analyzing user reviews for a travel website. These reviews often include inconsistent capitalization like "TRAVEL" and "travel". To prepare the text for sentiment analysis and topic extraction, you'll first convert all words to lowercase, then tokenize them and clean them from stop words and punctuation.

The word_tokenize() function, a stop_words list have been provided. NLTK resources are already downloaded.

This exercise is part of the course

Natural Language Processing (NLP) in Python

View Course

Exercise instructions

  • Convert the provided review into lowercase.
  • Tokenize the lower_text into words.
  • Use list comprehension to remove stop words and punctuation using the lists of stop_words and string.punctuation.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

review = "I have been FLYING a lot lately and the Flights just keep getting DELAYED. Honestly, traveling for WORK gets exhausting with endless delays, but every trip teaches you something new!"

# Lowercase the review
lower_text = ____

# Tokenize the lower_text into words
tokens = ____

# Remove stop words and punctuation
clean_tokens = [____ if word ____ and word ____]

print(clean_tokens)
Edit and Run Code