Word tokenization with NLTK
Here, you'll be using the first scene of Monty Python's Holy Grail, which has been pre-loaded as scene_one
. Feel free to check it out in the IPython Shell!
Your job in this exercise is to utilize word_tokenize
and sent_tokenize
from nltk.tokenize
to tokenize both words and sentences from Python strings - in this case, the first scene of Monty Python's Holy Grail.
This exercise is part of the course
Introduction to Natural Language Processing in Python
Exercise instructions
- Import the
sent_tokenize
andword_tokenize
functions fromnltk.tokenize
. - Tokenize all the sentences in
scene_one
using thesent_tokenize()
function. - Tokenize the fourth sentence in
sentences
, which you can access assentences[3]
, using theword_tokenize()
function. - Find the unique tokens in the entire scene by using
word_tokenize()
onscene_one
and then converting it into a set usingset()
. - Print the unique tokens found. This has been done for you, so hit 'Submit Answer' to see the results!
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Import necessary modules
____
____
# Split scene_one into sentences: sentences
sentences = ____(____)
# Use word_tokenize to tokenize the fourth sentence: tokenized_sent
tokenized_sent = ____(____[_])
# Make a set of unique tokens in the entire scene: unique_tokens
unique_tokens = ____(____(____))
# Print the unique tokens result
print(unique_tokens)