Aan de slagGa gratis aan de slag

Word tokenization with NLTK

Here, you'll be using the first scene of Monty Python's Holy Grail, which has been pre-loaded as scene_one. Feel free to check it out in the IPython Shell!

Your job in this exercise is to utilize word_tokenize and sent_tokenize from nltk.tokenize to tokenize both words and sentences from Python strings - in this case, the first scene of Monty Python's Holy Grail.

Deze oefening maakt deel uit van de cursus

Introduction to Natural Language Processing in Python

Cursus bekijken

Oefeninstructies

  • Import the sent_tokenize and word_tokenize functions from nltk.tokenize.
  • Tokenize all the sentences in scene_one using the sent_tokenize() function.
  • Tokenize the fourth sentence in sentences, which you can access as sentences[3], using the word_tokenize() function.
  • Find the unique tokens in the entire scene by using word_tokenize() on scene_one and then converting it into a set using set().
  • Print the unique tokens found. This has been done for you, so hit 'Submit Answer' to see the results!

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

# Import necessary modules
____
____

# Split scene_one into sentences: sentences
sentences = ____(____)

# Use word_tokenize to tokenize the fourth sentence: tokenized_sent
tokenized_sent = ____(____[_])

# Make a set of unique tokens in the entire scene: unique_tokens
unique_tokens = ____(____(____))

# Print the unique tokens result
print(unique_tokens)
Code bewerken en uitvoeren