Word tokenization with NLTK
Here, you'll be using the first scene of Monty Python's Holy Grail, which has been pre-loaded as scene_one. Feel free to check it out in the IPython Shell!
Your job in this exercise is to utilize word_tokenize and sent_tokenize from nltk.tokenize to tokenize both words and sentences from Python strings - in this case, the first scene of Monty Python's Holy Grail.
Deze oefening maakt deel uit van de cursus
Introduction to Natural Language Processing in Python
Oefeninstructies
- Import the
sent_tokenizeandword_tokenizefunctions fromnltk.tokenize. - Tokenize all the sentences in
scene_oneusing thesent_tokenize()function. - Tokenize the fourth sentence in
sentences, which you can access assentences[3], using theword_tokenize()function. - Find the unique tokens in the entire scene by using
word_tokenize()onscene_oneand then converting it into a set usingset(). - Print the unique tokens found. This has been done for you, so hit 'Submit Answer' to see the results!
Praktische interactieve oefening
Probeer deze oefening eens door deze voorbeeldcode in te vullen.
# Import necessary modules
____
____
# Split scene_one into sentences: sentences
sentences = ____(____)
# Use word_tokenize to tokenize the fourth sentence: tokenized_sent
tokenized_sent = ____(____[_])
# Make a set of unique tokens in the entire scene: unique_tokens
unique_tokens = ____(____(____))
# Print the unique tokens result
print(unique_tokens)