Word tokenization with NLTK
Here, you'll be using the first scene of Monty Python's Holy Grail, which has been pre-loaded as scene_one. Feel free to check it out in the IPython Shell!
Your job in this exercise is to utilize word_tokenize and sent_tokenize from nltk.tokenize to tokenize both words and sentences from Python strings - in this case, the first scene of Monty Python's Holy Grail.
This exercise is part of the course
Introduction to Natural Language Processing in Python
Exercise instructions
- Import the
sent_tokenizeandword_tokenizefunctions fromnltk.tokenize. - Tokenize all the sentences in
scene_oneusing thesent_tokenize()function. - Tokenize the fourth sentence in
sentences, which you can access assentences[3], using theword_tokenize()function. - Find the unique tokens in the entire scene by using
word_tokenize()onscene_oneand then converting it into a set usingset(). - Print the unique tokens found. This has been done for you, so hit 'Submit Answer' to see the results!
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Import necessary modules
____
____
# Split scene_one into sentences: sentences
sentences = ____(____)
# Use word_tokenize to tokenize the fourth sentence: tokenized_sent
tokenized_sent = ____(____[_])
# Make a set of unique tokens in the entire scene: unique_tokens
unique_tokens = ____(____(____))
# Print the unique tokens result
print(unique_tokens)