Regex with NLTK tokenization
Twitter is a frequently used source for NLP text and tasks. In this exercise, you'll build a more complex tokenizer for tweets with hashtags and mentions using nltk
and regex. The nltk.tokenize.TweetTokenizer
class gives you some extra methods and attributes for parsing tweets.
Here, you're given some example tweets to parse using both TweetTokenizer
and regexp_tokenize
from the nltk.tokenize
module. These example tweets have been pre-loaded into the variable tweets
. Feel free to explore it in the IPython Shell!
Unlike the syntax for the regex library, with nltk_tokenize()
you pass the pattern as the second argument.
This exercise is part of the course
Introduction to Natural Language Processing in Python
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Import the necessary modules
____
____