Get startedGet started for free

Tokenizing sentences with Keras

Here you will get your hands dirty with the Keras Tokenizer. The Keras Tokenizer is a great utility that helps you to do some crucial text processing with a few lines of code. For example, the Keras Tokenizer will automatically map the words in your vocabulary to IDs with a single function call. Here, you will learn about this in more detail.

You will be creating a Keras Tokenizer object and fitting it on some text, which will allow the Tokenizer to build a dictionary of words and their corresponding IDs. The text used to train the Tokenizer is obtained from the Udacity Github Repo.

This exercise is part of the course

Machine Translation with Keras

View Course

Exercise instructions

  • Define a Keras Tokenizer object.
  • Fit the tokenizer on en_text.
  • Get the word ID for each word w in the given list ["january", "apples", "summer"].
  • Print the word and its corresponding ID.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

from tensorflow.keras.preprocessing.text import Tokenizer

# Define a Keras Tokenizer
en_tok = ____

# Fit the tokenizer on some text
en_tok.____(____)

for w in ["january", "apples", "summer"]:
  # Get the word ID of word w
  id = en_tok.____[____]
  # Print the word and the word ID
  print(____, " has id: ", _____)
Edit and Run Code