Regex with NLTK tokenization

Twitter is a frequently used source for NLP text and tasks. In this exercise, you'll build a more complex tokenizer for tweets with hashtags and mentions using nltk and regex. The nltk.tokenize.TweetTokenizer class gives you some extra methods and attributes for parsing tweets.

Here, you're given some example tweets to parse using both TweetTokenizer and regexp_tokenize from the nltk.tokenize module. These example tweets have been pre-loaded into the variable tweets. Feel free to explore it in the IPython Shell!

Unlike the syntax for the regex library, with nltk_tokenize() you pass the pattern as the second argument.

This exercise is part of the course

Introduction to Natural Language Processing in Python

View Course

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Import the necessary modules
____
____

Edit and Run Code