Aan de slagGa gratis aan de slag

Tokenizing sentences with Keras

Here you will get your hands dirty with the Keras Tokenizer. The Keras Tokenizer is a great utility that helps you to do some crucial text processing with a few lines of code. For example, the Keras Tokenizer will automatically map the words in your vocabulary to IDs with a single function call. Here, you will learn about this in more detail.

You will be creating a Keras Tokenizer object and fitting it on some text, which will allow the Tokenizer to build a dictionary of words and their corresponding IDs. The text used to train the Tokenizer is obtained from the Udacity Github Repo.

Deze oefening maakt deel uit van de cursus

Machine Translation with Keras

Cursus bekijken

Oefeninstructies

  • Define a Keras Tokenizer object.
  • Fit the tokenizer on en_text.
  • Get the word ID for each word w in the given list ["january", "apples", "summer"].
  • Print the word and its corresponding ID.

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

from tensorflow.keras.preprocessing.text import Tokenizer

# Define a Keras Tokenizer
en_tok = ____

# Fit the tokenizer on some text
en_tok.____(____)

for w in ["january", "apples", "summer"]:
  # Get the word ID of word w
  id = en_tok.____[____]
  # Print the word and the word ID
  print(____, " has id: ", _____)
Code bewerken en uitvoeren