Session Ready
Exercise

Regex with NLTK tokenization

Twitter is a frequently used source for NLP text and tasks. In this exercise, you'll build a more complex tokenizer for tweets with hashtags and mentions using nltk and regex. The nltk.tokenize.TweetTokenizer class gives you some extra methods and attributes for parsing tweets.

Here, you're given some example tweets to parse using both TweetTokenizer and regexp_tokenize from the nltk.tokenize module. These example tweets have been pre-loaded into the variable tweets. Feel free to explore it in the IPython Shell!

Unlike the syntax for the regex library, with nltk_tokenize() you pass the pattern as the second argument.

Instructions 1/4
undefined XP
  • 1
  • 2
  • 3
  • 4
  • From nltk.tokenize, import regexp_tokenize and TweetTokenizer.